Not All Turns Are Equally Hard: Adaptive Thinking Budgets For Efficient Multi-Turn Reasoning
arXiv cs.LG / 4/8/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that LLM reasoning gains are plateauing, so improving inference-time compute efficiency is essential to reduce unnecessary long “thinking traces,” especially in multi-turn settings where turns depend on each other.
- It formulates multi-turn reasoning as a sequential compute allocation problem using a multi-objective Markov Decision Process, then introduces TAB (Turn-Adaptive Budgets) to adaptively allocate token budgets per turn under a global per-problem token constraint.
- TAB is trained with Group Relative Policy Optimization (GRPO) to maximize accuracy while learning to spend fewer tokens on easier turns and reserve more tokens for harder, critical reasoning steps.
- Experiments on mathematical reasoning benchmarks show TAB achieves a better accuracy–tokens tradeoff, saving up to 35% tokens while maintaining accuracy versus static and off-the-shelf budget baselines.
- The paper also proposes TAB All-SubQ, which leverages an available plan of sub-questions to allocate budgets across past and future sub-questions, yielding up to 40% token savings over baselines.
Related Articles

Black Hat Asia
AI Business
[N] Just found out that Milla Jovovich is a dev, invested in AI, and just open sourced a project
Reddit r/MachineLearning

ALTK‑Evolve: On‑the‑Job Learning for AI Agents
Hugging Face Blog

Context Windows Are Getting Absurd — And That's a Good Thing
Dev.to

Every AI Agent Registry in 2026, Compared
Dev.to