Use this file to discover all available pages before exploring further.
The most powerful BoxLite pattern: let an LLM generate code and execute it in a sandbox. The LLM reasons about the problem, writes Python, and BoxLite runs it safely. This tutorial shows how to wire up tool calling with three popular providers.
Every LLM provider supports tool calling (also called function calling). You define a tool called execute_python that accepts a code parameter. When the LLM decides it needs to compute something, it returns a tool call instead of a text response. Your code executes that tool call inside a BoxLite CodeBox and feeds the output back to the LLM, which then formulates the final answer.The pattern is the same regardless of provider:
Define the tool schema (what “execute code” means)
Send messages to the LLM with the tool available
When the LLM returns a tool call, run the code in BoxLite
import asyncioimport jsonimport boxlitefrom openai import AsyncOpenAIEXECUTE_CODE_TOOL = { "type": "function", "function": { "name": "execute_python", "description": "Execute Python code in a secure sandbox. Use this to run calculations, data analysis, or any Python code. The code's stdout is returned.", "parameters": { "type": "object", "properties": { "code": { "type": "string", "description": "Python code to execute. Use print() to output results.", } }, "required": ["code"], }, },}async def run_agent(question: str): client = AsyncOpenAI() async with boxlite.CodeBox() as codebox: messages = [ { "role": "system", "content": "You are a helpful assistant. When you need to compute something, write Python code using the execute_python tool. Always print() your results.", }, {"role": "user", "content": question}, ] while True: response = await client.chat.completions.create( model="gpt-4o", messages=messages, tools=[EXECUTE_CODE_TOOL], ) choice = response.choices[0] if choice.finish_reason == "tool_calls": messages.append(choice.message) for tool_call in choice.message.tool_calls: args = json.loads(tool_call.function.arguments) code = args["code"] print(f"--- Executing code ---\n{code}\n---") result = await codebox.run(code) print(f"Output: {result}") messages.append({ "role": "tool", "tool_call_id": tool_call.id, "content": result, }) else: print(f"\nAnswer: {choice.message.content}") breakif __name__ == "__main__": asyncio.run(run_agent("What's the 50th Fibonacci number?"))
openai-sandbox.ts
import { CodeBox } from '@boxlite-ai/boxlite';import OpenAI from 'openai';const EXECUTE_CODE_TOOL: OpenAI.ChatCompletionTool = { type: 'function', function: { name: 'execute_python', description: 'Execute Python code in a secure sandbox. Use print() to output results.', parameters: { type: 'object', properties: { code: { type: 'string', description: 'Python code to execute.' }, }, required: ['code'], }, },};async function runAgent(question: string) { const client = new OpenAI(); const codebox = new CodeBox(); try { const messages: OpenAI.ChatCompletionMessageParam[] = [ { role: 'system', content: 'You are a helpful assistant. When you need to compute something, write Python code using the execute_python tool. Always print() your results.' }, { role: 'user', content: question }, ]; while (true) { const response = await client.chat.completions.create({ model: 'gpt-4o', messages, tools: [EXECUTE_CODE_TOOL], }); const choice = response.choices[0]; if (choice.finish_reason === 'tool_calls') { messages.push(choice.message); for (const toolCall of choice.message.tool_calls!) { const { code } = JSON.parse(toolCall.function.arguments); console.log(`--- Executing code ---\n${code}\n---`); const result = await codebox.run(code); console.log(`Output: ${result}`); messages.push({ role: 'tool', tool_call_id: toolCall.id, content: result, }); } } else { console.log(`\nAnswer: ${choice.message.content}`); break; } } } finally { await codebox.stop(); }}runAgent("What's the 50th Fibonacci number?");
Expected output:
--- Executing code ---def fib(n): a, b = 0, 1 for _ in range(n): a, b = b, a + b return aprint(fib(50))---Output: 12586269025Answer: The 50th Fibonacci number is 12,586,269,025.
Anthropic uses a different tool schema format (input_schema instead of parameters) and a different response structure (stop_reason instead of finish_reason, content blocks instead of tool_calls).
Python
Node.js
anthropic_sandbox.py
import asyncioimport boxlitefrom anthropic import AsyncAnthropicEXECUTE_CODE_TOOL = { "name": "execute_python", "description": "Execute Python code in a secure sandbox. Use this to run calculations, data analysis, or any Python code. The code's stdout is returned.", "input_schema": { "type": "object", "properties": { "code": { "type": "string", "description": "Python code to execute. Use print() to output results.", } }, "required": ["code"], },}async def run_agent(question: str): client = AsyncAnthropic() async with boxlite.CodeBox() as codebox: messages = [{"role": "user", "content": question}] while True: response = await client.messages.create( model="claude-sonnet-4-5-20250929", max_tokens=4096, system="You are a helpful assistant. When you need to compute something, write Python code using the execute_python tool. Always print() your results.", tools=[EXECUTE_CODE_TOOL], messages=messages, ) # Serialize content blocks to plain dicts for the messages array assistant_content = [] for block in response.content: if block.type == "text": assistant_content.append({"type": "text", "text": block.text}) elif block.type == "tool_use": assistant_content.append({ "type": "tool_use", "id": block.id, "name": block.name, "input": block.input, }) messages.append({"role": "assistant", "content": assistant_content}) if response.stop_reason == "tool_use": tool_results = [] for block in response.content: if block.type == "tool_use": code = block.input["code"] print(f"--- Executing code ---\n{code}\n---") result = await codebox.run(code) print(f"Output: {result}") tool_results.append({ "type": "tool_result", "tool_use_id": block.id, "content": result, }) messages.append({"role": "user", "content": tool_results}) else: # Extract final text response for block in response.content: if hasattr(block, "text"): print(f"\nAnswer: {block.text}") breakif __name__ == "__main__": asyncio.run(run_agent("What's the 50th Fibonacci number?"))
anthropic-sandbox.ts
import { CodeBox } from '@boxlite-ai/boxlite';import Anthropic from '@anthropic-ai/sdk';const EXECUTE_CODE_TOOL: Anthropic.Tool = { name: 'execute_python', description: 'Execute Python code in a secure sandbox. Use print() to output results.', input_schema: { type: 'object' as const, properties: { code: { type: 'string', description: 'Python code to execute.' }, }, required: ['code'], },};async function runAgent(question: string) { const client = new Anthropic(); const codebox = new CodeBox(); try { const messages: Anthropic.MessageParam[] = [ { role: 'user', content: question }, ]; while (true) { const response = await client.messages.create({ model: 'claude-sonnet-4-5-20250929', max_tokens: 4096, system: 'You are a helpful assistant. When you need to compute something, write Python code using the execute_python tool. Always print() your results.', tools: [EXECUTE_CODE_TOOL], messages, }); messages.push({ role: 'assistant', content: response.content }); if (response.stop_reason === 'tool_use') { const toolResults: Anthropic.ToolResultBlockParam[] = []; for (const block of response.content) { if (block.type === 'tool_use') { const { code } = block.input as { code: string }; console.log(`--- Executing code ---\n${code}\n---`); const result = await codebox.run(code); console.log(`Output: ${result}`); toolResults.push({ type: 'tool_result', tool_use_id: block.id, content: result, }); } } messages.push({ role: 'user', content: toolResults }); } else { for (const block of response.content) { if (block.type === 'text') { console.log(`\nAnswer: ${block.text}`); } } break; } } } finally { await codebox.stop(); }}runAgent("What's the 50th Fibonacci number?");
The Vercel AI SDK provides a unified tool() helper that handles schema validation and execution in one place. The maxSteps parameter replaces the manual while loop — the SDK automatically re-calls the model when a tool result is returned.
vercel-ai-sandbox.ts
import { CodeBox } from '@boxlite-ai/boxlite';import { generateText, tool } from 'ai';import { openai } from '@ai-sdk/openai';import { z } from 'zod';async function runAgent(question: string) { const codebox = new CodeBox(); try { const { text } = await generateText({ model: openai('gpt-4o'), system: 'You are a helpful assistant. When you need to compute something, write Python code using the execute_python tool. Always print() your results.', prompt: question, tools: { execute_python: tool({ description: 'Execute Python code in a secure sandbox. Use print() to output results.', parameters: z.object({ code: z.string().describe('Python code to execute.'), }), execute: async ({ code }) => { console.log(`--- Executing code ---\n${code}\n---`); const result = await codebox.run(code); console.log(`Output: ${result}`); return result; }, }), }, maxSteps: 5, }); console.log(`\nAnswer: ${text}`); } finally { await codebox.stop(); }}runAgent("What's the 50th Fibonacci number?");
The Vercel AI SDK supports many model providers — swap openai('gpt-4o') for anthropic('claude-sonnet-4-5-20250929'), google('gemini-2.0-flash'), or any other supported model. The tool definitions stay the same.
The CodeBox persists across calls, so installed packages and files on disk carry over between tool invocations within the same conversation. Note that in-memory state (variables, functions) does not persist — each run() is a separate Python process.
multi_question.py
async def main(): questions = [ "What's the standard deviation of [23, 45, 67, 89, 12, 34, 56]?", "Generate a random 8-character password with letters and digits.", "What day of the week was January 1, 2000?", ] for question in questions: print(f"\nQ: {question}") await run_agent(question)if __name__ == "__main__": asyncio.run(main())
For production deployments — concurrency models, timeout handling, security presets, and defensive execution patterns — see the AI agent integration guide.