Subagents Account for Most Token Costs in Long Agent Runs: Fixes That Cut Usage 70 to 90 Percent in Practice
Reddit r/artificial / 6/3/2026
💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- A 2026 study (Bai et al.) analyzing SWE-bench agentic coding found that multi-agent/agentic tasks use about 1,000x more tokens than normal chat and show large token variance, with accuracy not improving proportionally to spend.
- In practice, a tracked research synthesis run that hit 450,000 context tokens was reduced to around 85,000 tokens after adding controls that keep key files (PLAN.md/INVARIANTS.md) and fresh budgets out of the rolling conversation window.
- The article’s proposed fixes include using a per-turn 2,000-line read-budget gate, keeping coordination notes for subagents out of the main transcript, and rethinking what historical content is re-queried.
- For dynamic tool discovery, one harness cut input tokens by 96% and total cost by 90% by loading tool schemas only for tools the agent actually selects instead of injecting a full catalog every time.
- The post concludes with an implementation checklist and invites readers to share token/cost issues seen in their own long-running agent sessions.
Continue reading this article on the original site.
Read original →Related Articles

Black Hat USA
AI Business
What's the best AI image generator for fine art?
Reddit r/artificial

A Curated List of Articles About Modern Software Testing
Dev.to

"ADAPT or become a FOOTNOTE."
Dev.to

BizNode's semantic memory (Qdrant) makes your bot smarter over time — it remembers past conversations and answers...
Dev.to