When More Thinking Hurts: Overthinking in LLM Test-Time Compute Scaling

arXiv cs.AI / 4/14/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper challenges the common assumption that increasing LLM test-time “reasoning” (longer chains of thought) monotonically improves outcomes.
It shows diminishing marginal returns at higher compute budgets, including “overthinking” where extra reasoning correlates with abandoning previously correct answers.
The authors demonstrate that optimal thinking length depends on problem difficulty, implying that fixed/uniform compute allocation is inefficient.
Using a cost-aware evaluation framework, they find that stopping at moderate reasoning budgets can substantially cut computation while preserving similar accuracy.
Overall, the work reframes test-time compute scaling as a problem of finding an optimal stopping point rather than simply maximizing reasoning length.

Abstract

Scaling test-time compute through extended chains of thought has become a dominant paradigm for improving large language model reasoning. However, existing research implicitly assumes that longer thinking always yields better results. This assumption remains largely unexamined. We systematically investigate how the marginal utility of additional reasoning tokens changes as compute budgets increase. We find that marginal returns diminish substantially at higher budgets and that models exhibit ``overthinking'', where extended reasoning is associated with abandoning previously correct answers. Furthermore, we show that optimal thinking length varies across problem difficulty, suggesting that uniform compute allocation is suboptimal. Our cost-aware evaluation framework reveals that stopping at moderate budgets can reduce computation significantly while maintaining comparable accuracy.

Don't forget, there is more than forgetting: new metrics for Continual Learning

Dev.to

Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale

Dev.to

Bit of a strange question?

Reddit r/artificial

One URL for Your AI Agent: HTML, JSON, Markdown, and an A2A Card

Dev.to

One URL for Your AI Agent: HTML, JSON, Markdown, and an A2A Card

Dev.to

When More Thinking Hurts: Overthinking in LLM Test-Time Compute Scaling

Key Points

Abstract

Related Articles

Don't forget, there is more than forgetting: new metrics for Continual Learning

Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale

Bit of a strange question?

One URL for Your AI Agent: HTML, JSON, Markdown, and an A2A Card

One URL for Your AI Agent: HTML, JSON, Markdown, and an A2A Card

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer