Avoiding Overthinking and Underthinking: Curriculum-Aware Budget Scheduling for LLMs
arXiv cs.CL / 4/23/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- The paper argues that current LLM test-time reasoning methods waste tokens by using fixed or uniformly sampled budgets, causing overthinking on easy problems and underthinking on hard ones.
- It proposes Budget-Adaptive Curriculum Reasoning (BCAE/BACR), which improves both reasoning quality and token efficiency using (1) budget-conditioned unified policies, (2) a curriculum-aware budget scheduler driven by learning progress, and (3) truncation-aware dense rewards with process-level verification.
- The work introduces Budget-Conditioned Advantage Estimation to reduce gradient variance by conditioning the advantage baseline on the sampled budget.
- Experiments on mathematical reasoning benchmarks (MATH, GSM8K, AIME, and Minerva Math) show consistent improvements across token budgets, including up to an 8.3% accuracy gain under tight budgets, alongside a 34% reduction in average token usage versus unconstrained reasoning.
Related Articles

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans
Dev.to

Elevating Austria: Google invests in its first data center in the Alps.
Google Blog

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago
Dev.to

GPT Image 2 Subject-Lock Editing: A Practical Guide to input_fidelity
Dev.to

AI Tutor That Works Offline — Study Anywhere with EaseLearn AI
Dev.to