Connect to an LLM - BoxLite Documentation

The most powerful BoxLite pattern: let an LLM generate code and execute it in a sandbox. The LLM reasons about the problem, writes Python, and BoxLite runs it safely. This tutorial shows how to wire up tool calling with three popular providers.

How it works

Every LLM provider supports tool calling (also called function calling). You define a tool called execute_python that accepts a code parameter. When the LLM decides it needs to compute something, it returns a tool call instead of a text response. Your code executes that tool call inside a BoxLite CodeBox and feeds the output back to the LLM, which then formulates the final answer. The pattern is the same regardless of provider:

Define the tool schema (what “execute code” means)
Send messages to the LLM with the tool available
When the LLM returns a tool call, run the code in BoxLite
Send the result back to the LLM
Repeat until the LLM responds with text

Prerequisites

OpenAI
Anthropic
Vercel AI SDK

pip install boxlite openai
export OPENAI_API_KEY="sk-..."

pip install boxlite anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

npm install @boxlite-ai/boxlite ai@3 @ai-sdk/openai@0 [email protected]

export OPENAI_API_KEY="sk-..."

OpenAI

Python
Node.js

openai_sandbox.py

import asyncio
import json
import boxlite
from openai import AsyncOpenAI

EXECUTE_CODE_TOOL = {
    "type": "function",
    "function": {
        "name": "execute_python",
        "description": "Execute Python code in a secure sandbox. Use this to run calculations, data analysis, or any Python code. The code's stdout is returned.",
        "parameters": {
            "type": "object",
            "properties": {
                "code": {
                    "type": "string",
                    "description": "Python code to execute. Use print() to output results.",
                }
            },
            "required": ["code"],
        },
    },
}


async def run_agent(question: str):
    client = AsyncOpenAI()

    async with boxlite.CodeBox() as codebox:
        messages = [
            {
                "role": "system",
                "content": "You are a helpful assistant. When you need to compute something, write Python code using the execute_python tool. Always print() your results.",
            },
            {"role": "user", "content": question},
        ]

        while True:
            response = await client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                tools=[EXECUTE_CODE_TOOL],
            )

            choice = response.choices[0]

            if choice.finish_reason == "tool_calls":
                messages.append(choice.message)
                for tool_call in choice.message.tool_calls:
                    args = json.loads(tool_call.function.arguments)
                    code = args["code"]

                    print(f"--- Executing code ---\n{code}\n---")
                    result = await codebox.run(code)
                    print(f"Output: {result}")

                    messages.append({
                        "role": "tool",
                        "tool_call_id": tool_call.id,
                        "content": result,
                    })
            else:
                print(f"\nAnswer: {choice.message.content}")
                break


if __name__ == "__main__":
    asyncio.run(run_agent("What's the 50th Fibonacci number?"))

openai-sandbox.ts

import { CodeBox } from '@boxlite-ai/boxlite';
import OpenAI from 'openai';

const EXECUTE_CODE_TOOL: OpenAI.ChatCompletionTool = {
  type: 'function',
  function: {
    name: 'execute_python',
    description: 'Execute Python code in a secure sandbox. Use print() to output results.',
    parameters: {
      type: 'object',
      properties: {
        code: { type: 'string', description: 'Python code to execute.' },
      },
      required: ['code'],
    },
  },
};

async function runAgent(question: string) {
  const client = new OpenAI();
  const codebox = new CodeBox();

  try {
    const messages: OpenAI.ChatCompletionMessageParam[] = [
      { role: 'system', content: 'You are a helpful assistant. When you need to compute something, write Python code using the execute_python tool. Always print() your results.' },
      { role: 'user', content: question },
    ];

    while (true) {
      const response = await client.chat.completions.create({
        model: 'gpt-4o',
        messages,
        tools: [EXECUTE_CODE_TOOL],
      });

      const choice = response.choices[0];

      if (choice.finish_reason === 'tool_calls') {
        messages.push(choice.message);
        for (const toolCall of choice.message.tool_calls!) {
          const { code } = JSON.parse(toolCall.function.arguments);

          console.log(`--- Executing code ---\n${code}\n---`);
          const result = await codebox.run(code);
          console.log(`Output: ${result}`);

          messages.push({
            role: 'tool',
            tool_call_id: toolCall.id,
            content: result,
          });
        }
      } else {
        console.log(`\nAnswer: ${choice.message.content}`);
        break;
      }
    }
  } finally {
    await codebox.stop();
  }
}

runAgent("What's the 50th Fibonacci number?");

Expected output:

--- Executing code ---
def fib(n):
    a, b = 0, 1
    for _ in range(n):
        a, b = b, a + b
    return a

print(fib(50))
---
Output: 12586269025

Answer: The 50th Fibonacci number is 12,586,269,025.

Anthropic

Anthropic uses a different tool schema format (input_schema instead of parameters) and a different response structure (stop_reason instead of finish_reason, content blocks instead of tool_calls).

Python
Node.js

anthropic_sandbox.py

import asyncio
import boxlite
from anthropic import AsyncAnthropic

EXECUTE_CODE_TOOL = {
    "name": "execute_python",
    "description": "Execute Python code in a secure sandbox. Use this to run calculations, data analysis, or any Python code. The code's stdout is returned.",
    "input_schema": {
        "type": "object",
        "properties": {
            "code": {
                "type": "string",
                "description": "Python code to execute. Use print() to output results.",
            }
        },
        "required": ["code"],
    },
}


async def run_agent(question: str):
    client = AsyncAnthropic()

    async with boxlite.CodeBox() as codebox:
        messages = [{"role": "user", "content": question}]

        while True:
            response = await client.messages.create(
                model="claude-sonnet-4-5-20250929",
                max_tokens=4096,
                system="You are a helpful assistant. When you need to compute something, write Python code using the execute_python tool. Always print() your results.",
                tools=[EXECUTE_CODE_TOOL],
                messages=messages,
            )

            # Serialize content blocks to plain dicts for the messages array
            assistant_content = []
            for block in response.content:
                if block.type == "text":
                    assistant_content.append({"type": "text", "text": block.text})
                elif block.type == "tool_use":
                    assistant_content.append({
                        "type": "tool_use",
                        "id": block.id,
                        "name": block.name,
                        "input": block.input,
                    })
            messages.append({"role": "assistant", "content": assistant_content})

            if response.stop_reason == "tool_use":
                tool_results = []
                for block in response.content:
                    if block.type == "tool_use":
                        code = block.input["code"]

                        print(f"--- Executing code ---\n{code}\n---")
                        result = await codebox.run(code)
                        print(f"Output: {result}")

                        tool_results.append({
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": result,
                        })

                messages.append({"role": "user", "content": tool_results})
            else:
                # Extract final text response
                for block in response.content:
                    if hasattr(block, "text"):
                        print(f"\nAnswer: {block.text}")
                break


if __name__ == "__main__":
    asyncio.run(run_agent("What's the 50th Fibonacci number?"))

anthropic-sandbox.ts

import { CodeBox } from '@boxlite-ai/boxlite';
import Anthropic from '@anthropic-ai/sdk';

const EXECUTE_CODE_TOOL: Anthropic.Tool = {
  name: 'execute_python',
  description: 'Execute Python code in a secure sandbox. Use print() to output results.',
  input_schema: {
    type: 'object' as const,
    properties: {
      code: { type: 'string', description: 'Python code to execute.' },
    },
    required: ['code'],
  },
};

async function runAgent(question: string) {
  const client = new Anthropic();
  const codebox = new CodeBox();

  try {
    const messages: Anthropic.MessageParam[] = [
      { role: 'user', content: question },
    ];

    while (true) {
      const response = await client.messages.create({
        model: 'claude-sonnet-4-5-20250929',
        max_tokens: 4096,
        system: 'You are a helpful assistant. When you need to compute something, write Python code using the execute_python tool. Always print() your results.',
        tools: [EXECUTE_CODE_TOOL],
        messages,
      });

      messages.push({ role: 'assistant', content: response.content });

      if (response.stop_reason === 'tool_use') {
        const toolResults: Anthropic.ToolResultBlockParam[] = [];
        for (const block of response.content) {
          if (block.type === 'tool_use') {
            const { code } = block.input as { code: string };

            console.log(`--- Executing code ---\n${code}\n---`);
            const result = await codebox.run(code);
            console.log(`Output: ${result}`);

            toolResults.push({
              type: 'tool_result',
              tool_use_id: block.id,
              content: result,
            });
          }
        }
        messages.push({ role: 'user', content: toolResults });
      } else {
        for (const block of response.content) {
          if (block.type === 'text') {
            console.log(`\nAnswer: ${block.text}`);
          }
        }
        break;
      }
    }
  } finally {
    await codebox.stop();
  }
}

runAgent("What's the 50th Fibonacci number?");

Vercel AI SDK

The Vercel AI SDK provides a unified tool() helper that handles schema validation and execution in one place. The maxSteps parameter replaces the manual while loop — the SDK automatically re-calls the model when a tool result is returned.

vercel-ai-sandbox.ts

import { CodeBox } from '@boxlite-ai/boxlite';
import { generateText, tool } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';

async function runAgent(question: string) {
  const codebox = new CodeBox();

  try {
    const { text } = await generateText({
      model: openai('gpt-4o'),
      system: 'You are a helpful assistant. When you need to compute something, write Python code using the execute_python tool. Always print() your results.',
      prompt: question,
      tools: {
        execute_python: tool({
          description: 'Execute Python code in a secure sandbox. Use print() to output results.',
          parameters: z.object({
            code: z.string().describe('Python code to execute.'),
          }),
          execute: async ({ code }) => {
            console.log(`--- Executing code ---\n${code}\n---`);
            const result = await codebox.run(code);
            console.log(`Output: ${result}`);
            return result;
          },
        }),
      },
      maxSteps: 5,
    });

    console.log(`\nAnswer: ${text}`);
  } finally {
    await codebox.stop();
  }
}

runAgent("What's the 50th Fibonacci number?");

The Vercel AI SDK supports many model providers — swap openai('gpt-4o') for anthropic('claude-sonnet-4-5-20250929'), google('gemini-2.0-flash'), or any other supported model. The tool definitions stay the same.

Multiple questions

The CodeBox persists across calls, so installed packages and files on disk carry over between tool invocations within the same conversation. Note that in-memory state (variables, functions) does not persist — each run() is a separate Python process.

multi_question.py

async def main():
    questions = [
        "What's the standard deviation of [23, 45, 67, 89, 12, 34, 56]?",
        "Generate a random 8-character password with letters and digits.",
        "What day of the week was January 1, 2000?",
    ]

    for question in questions:
        print(f"\nQ: {question}")
        await run_agent(question)


if __name__ == "__main__":
    asyncio.run(main())

For production deployments — concurrency models, timeout handling, security presets, and defensive execution patterns — see the AI agent integration guide.

What’s next?

Automate a browser

Use BrowserBox to give your LLM agent web browsing capabilities.

AI agent integration

Production patterns: timeouts, security presets, and concurrency.

Upload & download files

Pass data files to your sandbox and retrieve results.

CodeBox API reference

Full API docs for CodeBox.

​How it works

​Prerequisites

​OpenAI

​Anthropic

​Vercel AI SDK

​Multiple questions

​What’s next?

Automate a browser

AI agent integration

Upload & download files

CodeBox API reference

How it works

Prerequisites

OpenAI

Anthropic

Vercel AI SDK

Multiple questions

What’s next?