Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows
arXiv cs.AI / 4/25/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that MCP-style integrations impose a hidden “MCP/Tools Tax,” with per-turn overhead in the range of ~10k–60k tokens in typical multi-server agent deployments.
- It introduces “Tool Attention,” a middleware approach that gates tool use via intent-based similarity (ISO score), state-aware preconditions/access scopes, and two-phase lazy schema loading.
- The lazy loader keeps only compact schema summaries in context and promotes full JSON schemas only for the top-k tools that pass gating, reducing unnecessary token payloads.
- In a simulated 120-tool, six-server benchmark (with token counts calibrated to audited real MCP deployments), Tool Attention cuts per-turn tool tokens by 95.0% (47.3k → 2.4k) and boosts effective context utilization from 24% to 91%.
- The paper reports end-to-end improvements (task success, latency, cost, reasoning quality) as projected results derived from token measurements rather than experiments on live LLM agents, and provides the code on GitHub.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

AI agents have no identity — we built the open registry that gives them one
Dev.to

Democratic Governance of AI Is the Real Solution
Reddit r/artificial

I Built a 24/7 AI Agent System on a $6/Month VPS — Here's the Stack
Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to