Anthropic Turns MCP Servers Into Code APIs to Cut Token Costs

Why MCP agents struggle at scale

Agents that use the Model Context Protocol face a practical scaling problem: every tool definition and every intermediate result is pushed through the model context. When workflows involve many tools or large payloads, token consumption, latency, and cost can spike and context limits are reached quickly.

The problem with direct MCP tool calls

MCP is an open standard that exposes external systems to models via servers that list tools. In the default pattern an agent loads many tool definitions into the model context. Each tool includes schemas and metadata, and intermediate outputs from tool calls are streamed back into the context so the model can decide subsequent steps. That means large datasets can be read and then passed back again to other tools through the model, multiplying token usage without changing task logic.

How Anthropic rethinks MCP with code execution

Anthropic proposes a different pipeline: represent MCP servers as code level APIs and run model-written code in a sandbox. The MCP client generates a filesystem structure that mirrors available servers and tools. For each MCP tool the client creates a thin wrapper source file, for example servers/google-drive/getDocument.ts, which calls the MCP tool with typed parameters. The model is instructed to write TypeScript that imports these wrappers, orchestrates calls, and performs data handling inside the runtime. The model no longer directly ingests large intermediate payloads.

A concrete example

In the previous pattern a Google Drive transcript might be returned through the model and then passed again when calling a Salesforce tool, costing tens of thousands of tokens. Under the code execution pattern a short TypeScript script calls the Google Drive wrapper, processes the transcript locally inside the execution environment, and calls the Salesforce wrapper with only the required summary or small sample. The model only sees compact outputs instead of entire payloads.

Measured impact

Anthropic reports a case where an end to end workflow went from about 150,000 tokens to roughly 2,000 tokens when converted to the filesystem based MCP APIs and code execution loop. That corresponds to a 98.7 percent reduction in token usage for that scenario, with direct benefits in lower cost and reduced latency.

Design and operational benefits

Progressive tool discovery: The agent no longer needs to preload every tool definition. It can inspect the generated filesystem, list servers, and read only the modules it actually needs. This shifts tool catalogs out of the model context and into code, so tokens are spent only on relevant interfaces.

Efficient data handling: Large datasets stay inside the execution environment. TypeScript can fetch a big spreadsheet through an MCP wrapper, filter rows, compute aggregates, and return only summaries or small examples to the model. Heavy data movement stays out of the model context.

Privacy preserving operations: Sensitive fields can be tokenized inside the execution environment. The model sees placeholders while the MCP client maps and restores real values only when calling downstream tools, letting data move between servers without exposing raw identifiers to the model.

State and reusable skills: The filesystem can store intermediate files and helper scripts. Transformations or report generators can be saved in a skills directory and reused across sessions. Anthropic links this idea to Claude Skills, where collections of scripts and metadata form higher level capabilities.

Security and trade offs

Pushing work into a sandboxed runtime reduces token costs and latency but raises operational concerns. Teams must treat code execution security seriously, control bindings and network access inside the isolate, and maintain secure handling of sensitive mappings. Converting MCP into an executable API surface improves efficiency but places new responsibilities on engineers and platform operators.

Implications for agent builders

Converting MCP servers into code APIs is a pragmatic way to attack the core scaling limits of context based agents. It reduces token overhead, localizes heavy data processing, and enables progressive discovery and reuse. For many agents the pattern offers immediate cost and latency wins while changing how teams design and secure model integrations.

Anthropic Turns MCP Servers Into Code APIs to Cut Token Costs

Сменить язык