How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks
arXiv cs.CL / 4/27/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The study analyzes token-consumption patterns in agentic coding tasks by examining trajectories from eight frontier LLMs on SWE-bench Verified to answer where tokens are spent, which models are more token-efficient, and whether usage can be predicted in advance.
- Agentic tasks are found to be exceptionally token-expensive—about 1,000× more tokens than code reasoning and code chat—with input tokens dominating overall cost rather than output.
- Token usage is highly variable and stochastic: the same task can differ by as much as 30× in total tokens across runs, and higher token usage does not reliably improve accuracy (which often peaks at intermediate cost and then saturates).
- The models’ token efficiency varies greatly, with Kimi-K2 and Claude-Sonnet-4.5 consuming over 1.5 million more tokens on average than GPT-5 on the same tasks.
- The paper shows that human-rated task difficulty only weakly reflects actual token costs, and that frontier models generally struggle to predict their own token usage accurately while tending to underestimate real costs.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Legal Insight Transformation: 7 Mistakes to Avoid When Adopting AI Tools
Dev.to

Legal Insight Transformation: Traditional vs. AI-Driven Research Compared
Dev.to

Legal Insight Transformation: A Beginner's Guide to Modern Research
Dev.to
I tested the same prompt across multiple AI models… the differences surprised me
Reddit r/artificial

The five loops between AI coding and AI engineering
Dev.to