Loading...
Loading...
Anthropic’s MCP (Model Context Protocol), launched in Nov 2024 and now widely adopted, promised a unified open standard for models to call external tools, but production use in late 2025–2026 revealed a big drawback: MCP’s tool metadata consumes huge amounts of model context tokens, driving cost and latency. The context bloat—examples show dozens of tool schemas eating tens of thousands of tokens—prompted Anthropic engineers to acknowledge the issue. The debate intensified when Peter Steinberger
Tech professionals building agent systems must balance interoperability with runtime costs; MCP's metadata-driven approach affects latency, reliability, and inference costs. Decisions about agent tooling standards will influence architecture, deployment, and vendor choices.
Dossier last updated: 2026-05-29 23:34:00
Quandri Engineering tested Model Context Protocol (MCP) on a real stack and finds it costly: MCP tool schemas consume substantial LLM context windows (e.g., 77 tools ≈ 21k tokens, using 10.5% of a 200k-token Claude window and 16.5% of GPT-4o’s 128k), reducing space for actual prompts. They measured large per-tool sizes (Linear definitions alone ~12.8k tokens) and report reliability and performance problems—init failures, crashes, slower responses due to extra process round-trips, and opaque permissions. MCP also duplicates functionality available via CLIs/APIs, losing composability, debuggability, and human parity. An update notes Claude Code’s Deferred Loading mitigates context bloat, but Quandri argues performance, debugging, and architectural drawbacks remain relevant.
Quandri Engineering analyzed the Model Context Protocol (MCP) and found it impractical for production use: MCP consumes substantial LLM context, introduces reliability and latency problems, and duplicates existing CLI/API capabilities. Measurements on a Quandri stack show 77 tools from four MCP servers consuming ~21k tokens (10.5% of a 200k-token Claude window, 16.5% of a 128k GPT-4o window), with individual tool schemas like linear/save_issue using ~619 tokens. Operational issues include init failures, process maintenance, slower round-trips (benchmarks show multix slower than direct REST), mid-session crashes, and opaque permissions. Quandri argues CLI/API paths remain more composable, debuggable, and efficient; note: Claude Code’s deferred loading later reduces context bloat but other concerns persist.
Anthropic’s MCP (Model Context Protocol), launched in Nov 2024 and now widely adopted, promised a unified open standard for models to call external tools, but production use in late 2025–2026 revealed a big drawback: MCP’s tool metadata consumes huge amounts of model context tokens, driving cost and latency. The context bloat—examples show dozens of tool schemas eating tens of thousands of tokens—prompted Anthropic engineers to acknowledge the issue. The debate intensified when Peter Steinberger, creator of the viral OpenClaw agent, argued that traditional CLIs (bash) are often more efficient because LLMs already know command patterns, avoiding MCP’s metadata overhead. The article offers a decision framework for choosing MCP vs CLI based on trade-offs in cost, latency, reliability, and architecture.
非常激动,EverMe 终于来了,一行命令,2 分钟打通所有端侧记忆,全免费。 这是我们的旗舰 C 端产品。 登录复制一行命令,给你的 Codex,Claude Code,OpenClaw 或者 Hermes,甚至其他 agents。 便可以打通他们的记忆,跨 sessions,跨 agents。 厉害的是,它不仅仅是文本记忆,而是 agents 记忆, 是可以形成 cases 和 skills 的持续学习和进化的记忆, 所以,越用越聪明。 会开源,全免费,欢迎体验,更欢迎反馈😀。 https://t.co/2jhtRGR9qD