Agents & tools

Computer Use

The capability for an AI model to control a computer the way a human does — moving the mouse, clicking, typing, reading the screen.

01 ——

In plain English

Computer Use, originally Anthropic's term for a Claude capability launched in late 2024, is the broader pattern of an AI model controlling a desktop environment via screenshots, keyboard, and mouse. The model sees pixel-level screenshots, decides what to click, and sends synthetic input events.

Why it's significant: APIs are everywhere, but most enterprise software lives behind a GUI. Computer Use lets an agent operate any application — legacy ERPs, design tools, spreadsheets, video editors — without an integration. It's the most general possible interface.

Trade-offs:

Universal — works on any software with a UI
Slow — every step is a screenshot + reasoning round-trip
Fragile — pixel coordinates break with theme/resolution changes
Risky — a misclick in a real environment has real consequences (delete files, send emails)

Implementations: Anthropic Computer Use, OpenAI Operator (browser-only variant), Google Project Mariner, open-source Browser Use and Skyvern. Most production deployments run in sandboxed VMs to contain mistakes.

02 ——