MCP Context Bloat Is Breaking Enterprise AI Agents
Every MCP server injects its full tool schema (500-1500 tokens per tool) on initialization. In a 3-server enterprise setup (GitHub + Slack + Sentry, ~40 tools), 143,000 of 200,000 tokens are consumed before the agent processes a single query. Agents start forgetting, truncating, and failing. Claude Code's tool-search deferred loading cut this 85%, but the problem is still live for any stack not running the latest Claude Code version. Multiple HN threads, paid solutions, and open-source workarounds confirm it's a real deployment blocker.
Score Breakdown
Social Proof 2 sources
Gap Assessment
Several tools (mcp-compressor, mcp2cli, MCPlexor, Claude Code deferred loading) address it, but none are universal โ solutions only work within specific clients or require manual config. Wide-open for a protocol-level fix.