Large Language Models Decide Early and Explain Later
arXiv cs.CL / 4/27/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- The study examines whether large language model answers are already determined early during chain-of-thought generation, and whether later reasoning becomes “post-decision” explanation that adds cost without improving correctness.
- Using forced answer completion on Qwen3-4B across multiple datasets, the authors find the predicted final answer changes in only 32% of queries, and that after the final-answer switch the model generates about 760 extra reasoning tokens on average.
- The results suggest substantial redundancy in chain-of-thought generation, implying that much of later reasoning may not contribute to changing the final answer.
- The paper proposes early-stopping strategies (including probe-based stopping) that halt generation once the answer stabilizes, reducing reasoning tokens by about 500 per query while causing only a ~2% drop in accuracy.
- Overall, the work motivates inference-time techniques to cut latency and inference cost by stopping redundant reasoning while largely preserving performance.




