Subagents Account for Most Token Costs in Long Agent Runs: Fixes That Cut Usage 70 to 90 Percent in Practice

Reddit r/artificial / 6/3/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • A 2026 study (Bai et al.) analyzing SWE-bench agentic coding found that multi-agent/agentic tasks use about 1,000x more tokens than normal chat and show large token variance, with accuracy not improving proportionally to spend.
  • In practice, a tracked research synthesis run that hit 450,000 context tokens was reduced to around 85,000 tokens after adding controls that keep key files (PLAN.md/INVARIANTS.md) and fresh budgets out of the rolling conversation window.
  • The article’s proposed fixes include using a per-turn 2,000-line read-budget gate, keeping coordination notes for subagents out of the main transcript, and rethinking what historical content is re-queried.
  • For dynamic tool discovery, one harness cut input tokens by 96% and total cost by 90% by loading tool schemas only for tools the agent actually selects instead of injecting a full catalog every time.
  • The post concludes with an implementation checklist and invites readers to share token/cost issues seen in their own long-running agent sessions.

Continue reading this article on the original site.

Read original →