Skip to main content
ComputerBox runs a full Linux desktop (XFCE) inside an isolated VM and gives you programmatic control over the screen, mouse, and keyboard. Use it when you need GUI automation — filling out forms, testing desktop apps, or driving tools that don’t have a CLI.
Need browser automation only? Use BrowserBox instead — it’s lighter and gives you direct Playwright/Puppeteer access without a full desktop.

What you’ll build

A script that:
  1. Launches a ComputerBox with a GUI desktop
  2. Takes a screenshot
  3. Uses the mouse and keyboard to interact with the desktop
  4. Automates a complete workflow: open a text editor, type content, and save

Prerequisites

pip install boxlite
Requires Python 3.10+.

Step 1: Launch and take a screenshot

Create a ComputerBox, wait for the desktop to be ready, and capture a screenshot.
screenshot.py
import asyncio
import base64
from boxlite import ComputerBox


async def main():
    async with ComputerBox() as desktop:
        # Wait for the GUI desktop to fully load
        await desktop.wait_until_ready()
        print("Desktop is ready")

        # Take a screenshot
        screenshot = await desktop.screenshot()
        print(f"Screen size: {screenshot['width']}x{screenshot['height']}")

        # Save the screenshot to a file
        with open("desktop.png", "wb") as f:
            f.write(base64.b64decode(screenshot["data"]))
        print("Screenshot saved to desktop.png")


if __name__ == "__main__":
    asyncio.run(main())
What’s happening:
  • ComputerBox boots a VM with the lscr.io/linuxserver/webtop:ubuntu-xfce image — a full Ubuntu desktop with XFCE
  • wait_until_ready() blocks until the desktop environment is loaded and responsive
  • screenshot() returns a base64-encoded PNG with the current screen contents

Step 2: Mouse interaction

Move the cursor, click, double-click, and drag.
mouse.py
import asyncio
from boxlite import ComputerBox


async def main():
    async with ComputerBox() as desktop:
        await desktop.wait_until_ready()

        # Move the mouse to coordinates (512, 384)
        await desktop.mouse_move(512, 384)

        # Check where the cursor is
        x, y = await desktop.cursor_position()
        print(f"Cursor is at ({x}, {y})")

        # Click at the current position
        await desktop.left_click()

        # Double-click to open an item
        await desktop.double_click()

        # Right-click for context menu
        await desktop.right_click()

        # Drag from one position to another
        await desktop.left_click_drag(100, 100, 400, 400)

        print("Mouse interactions complete")


if __name__ == "__main__":
    asyncio.run(main())

Mouse methods reference

Method (Python)Method (Node.js)Description
mouse_move(x, y)mouseMove(x, y)Move cursor to coordinates
left_click()leftClick()Left click at current position
right_click()rightClick()Right click at current position
double_click()doubleClick()Double left click
triple_click()tripleClick()Triple left click (select line)
left_click_drag(sx, sy, ex, ey)leftClickDrag(sx, sy, ex, ey)Drag from start to end
cursor_position()cursorPosition()Get current cursor (x, y)

Step 3: Keyboard input

Type text and press key combinations.
keyboard.py
import asyncio
from boxlite import ComputerBox


async def main():
    async with ComputerBox() as desktop:
        await desktop.wait_until_ready()

        # Type text characters
        await desktop.type("Hello from BoxLite!")

        # Press Enter
        await desktop.key("Return")

        # Press a key combination (Ctrl+A to select all)
        await desktop.key("ctrl+a")

        # Press Ctrl+C to copy
        await desktop.key("ctrl+c")

        # Switch windows with Alt+Tab
        await desktop.key("alt+Tab")

        print("Keyboard interactions complete")


if __name__ == "__main__":
    asyncio.run(main())

Key syntax reference (xdotool format)

The key() method uses xdotool key syntax:
KeySyntax
EnterReturn
TabTab
EscapeEscape
BackspaceBackSpace
DeleteDelete
Spacespace
Arrow keysUp, Down, Left, Right
Function keysF1, F2, … F12
Page keysPage_Up, Page_Down, Home, End
Modifiersctrl, alt, shift, super
Combinationsctrl+c, ctrl+shift+s, alt+Tab

Step 4: Automate a full workflow

Put it all together — open a text editor, type content, save the file, and verify with a screenshot.
workflow.py
import asyncio
import base64
from boxlite import ComputerBox


async def main():
    async with ComputerBox(cpu=2, memory=2048) as desktop:
        await desktop.wait_until_ready()

        # Open the Mousepad text editor (comes with XFCE)
        # Double-click on the desktop to deselect, then use the app menu
        await desktop.key("ctrl+alt+t")  # Open terminal
        await asyncio.sleep(2)
        await desktop.type("mousepad /tmp/notes.txt &")
        await desktop.key("Return")
        await asyncio.sleep(3)

        # Type content into the editor
        await desktop.type("Meeting Notes")
        await desktop.key("Return")
        await desktop.key("Return")
        await desktop.type("1. Review Q4 targets")
        await desktop.key("Return")
        await desktop.type("2. Plan sprint backlog")
        await desktop.key("Return")
        await desktop.type("3. Assign action items")

        # Save the file with Ctrl+S
        await desktop.key("ctrl+s")
        await asyncio.sleep(1)

        # Take a screenshot to verify
        screenshot = await desktop.screenshot()
        with open("workflow_result.png", "wb") as f:
            f.write(base64.b64decode(screenshot["data"]))
        print("Workflow complete — screenshot saved to workflow_result.png")


if __name__ == "__main__":
    asyncio.run(main())
You can also view the desktop live in your browser at http://localhost:3000 (or whatever port you set with gui_http_port). This is helpful for debugging automation scripts — watch what’s happening in real time while your code runs.

Constructor options

ParameterPythonNode.jsDefaultDescription
CPU corescpucpus2Number of CPU cores
MemorymemorymemoryMib2048Memory in MiB
HTTP portgui_http_portguiHttpPort3000Port for browser-based desktop access
HTTPS portgui_https_portguiHttpsPort3001Port for secure desktop access

What’s next?