- Updated: February 25, 2026
- 6 min read
CLI vs MCP: Token Cost Savings and Lazy Loading Explained
CLI vs MCP: MCP sends the entire tool catalog as JSON at session start, while CLI lazily loads tool definitions, cutting token usage by up to 94 percent and dramatically lowering AI‑agent costs.
Why MCP’s Full‑Schema Dump Is 94 % More Expensive Than CLI’s Lazy Loading

Tech‑savvy developers constantly hunt for the cheapest way to run AI agents. A recent deep‑dive by Kan Yılmaz revealed that most agents built on the Model‑Centric Prompting (MCP) framework overpay for token consumption—up to 94 % more than a comparable Command‑Line Interface (CLI) approach. This article unpacks the token‑cost math, explains lazy loading of tool schemas, compares Anthropic’s Tool Search, and shows how the open‑source CLIHub project flips the script by turning MCP servers into ultra‑lightweight CLIs.
Key Takeaways
- MCP injects every tool’s JSON schema at session start (≈ 15 540 tokens for 84 tools).
- CLI sends only a tiny skill list (≈ 300 tokens) and fetches details on demand.
- Per‑tool call cost: MCP ≈ 30 tokens vs. CLI ≈ 610 tokens (discovery cost amortized).
- Anthropic’s Tool Search mimics CLI lazy loading but still pulls full schemas, making it 40‑85 % more expensive than pure CLI.
- CLIHub automates the conversion of MCP endpoints into ready‑to‑run CLIs, slashing token bills without sacrificing functionality.
MCP’s Token‑Heavy Startup
When an AI agent boots, MCP’s default behavior is to dump the entire tool catalog into the conversation as a JSON schema. For a typical deployment—six MCP servers, each exposing 14 tools (total 84 tools)—the token count looks like this:
| Metric | Tokens |
|---|---|
| Full schema dump (84 tools) | ≈ 15 540 |
| Single tool call (after dump) | ≈ 30 |
The schema includes every parameter, description, and enum for each tool. While this guarantees the agent knows everything upfront, it inflates the token bill dramatically—especially for agents that only need a handful of tools.
CLI’s Lean, Lazy‑Loading Model
CLI takes a minimalist approach. At session start it sends a compact list of tool names, descriptions, and executable locations—roughly 50 tokens per server. The heavy lifting (full help text, flags, and examples) is fetched only when the agent explicitly asks for --help or runs a command.
<available_tools>
<tool>
<name>notion</name>
<description>CLI for Notion</description>
<location>~/bin/notion</location>
</tool>
<tool>
<name>linear</name>
<description>CLI for Linear</description>
<location>~/bin/linear</location>
</tool>
</available_tools>
Token breakdown for the same 84‑tool environment:
| Metric | Tokens |
|---|---|
| Skill list (6 servers) | ≈ 300 |
| Help text for one tool (≈ 600) | ≈ 600 |
| Actual tool execution | ≈ 6 |
Even though the per‑call token count spikes (≈ 610 tokens) because the agent must first discover the command syntax, the overall session cost drops by more than 90 % when only a few tools are used.
CLI vs. MCP: Token Savings at Scale
| Tools Used | MCP Tokens | CLI Tokens | Savings |
|---|---|---|---|
| 1 tool | ~15 570 | ~910 | 94 % |
| 10 tools | ~15 840 | ~964 | 94 % |
| 100 tools | ~18 540 | ~1 504 | 92 % |
The numbers assume a typical 6‑server deployment. CLI’s lazy loading consistently outperforms MCP, especially when agents only need a subset of the available tools.
Anthropic’s Tool Search: A Hybrid Approach
Anthropic introduced Tool Search, which first loads a lightweight index and then fetches full JSON schemas on demand. The idea mirrors CLI’s lazy loading, but the on‑demand fetch still pulls the entire schema, keeping token usage higher than pure CLI.
// Session start (index only)
{
"tools": ["notion-search", "linear-create", "github-issue", …]
}
// When a tool is needed
GET /tool-schema?name=notion-search // returns full JSON schema
Compared to CLI, Tool Search saves roughly 40 % at session start but still incurs a 70‑80 % overhead per tool fetch. For organizations already locked into Anthropic models, it’s a decent compromise; for anyone else, CLI remains the most cost‑effective solution.
CLIHub: Turning MCP Servers into Feather‑Weight CLIs
Finding ready‑made CLIs for every tool can be a nightmare. CLIHub solves this by providing a one‑command converter that reads an MCP endpoint’s tool catalog and spits out a fully functional CLI wrapper. The generated CLIs inherit the same OAuth tokens and API endpoints, but they adopt CLI’s lazy‑loading behavior.
Typical workflow:
- Run
clihub generate --source https://mcp.example.com. - CLIHub creates a
bin/directory with one executable per tool. - Agents invoke the new CLI just like any native command, benefitting from the reduced token footprint.
Because the conversion is automated, developers can spin up a new CLI ecosystem for any MCP service in seconds, instantly reaping the 94 % token savings demonstrated above.
What This Means for SaaS and AI‑First Companies
For startups and SMBs, token costs translate directly into operational expenses. A 94 % reduction can mean the difference between a profitable AI‑assistant and a cash‑burning experiment. Companies that adopt CLI‑style lazy loading can:
- Scale to hundreds of tools without exploding token bills.
- Maintain fast response times because the model only processes relevant schemas.
- Leverage existing UBOS platform overview to host CLI wrappers alongside other micro‑services.
Moreover, the Enterprise AI platform by UBOS already supports custom CLI integrations, making the transition seamless for large teams.
Getting Started with UBOS for CLI‑Based AI Agents
If you’re ready to cut token costs, UBOS offers a suite of tools that align perfectly with the CLI approach:
- Workflow automation studio – design token‑efficient pipelines.
- Web app editor on UBOS – build UI front‑ends that call your new CLIs.
- UBOS templates for quick start – jump‑start projects with pre‑wired CLI integrations.
- UBOS pricing plans – choose a tier that matches your token‑saving goals.
For early‑stage teams, the UBOS for startups program provides credits and dedicated support to migrate from MCP to CLI‑based architectures.
Take Action Today
Stop letting hidden token taxes erode your AI budget. Evaluate your current MCP setup, generate a CLI with CLIHub, and integrate it into the UBOS partner program for ongoing optimization.
For a full technical walkthrough, read the original article that sparked this analysis.