Think Twice Before You Write -- an Entropy-based Decoding Strategy to Enhance LLM Reasoning

arXiv cs.AI / 4/2/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes an entropy-guided decoding strategy that adaptively decides when to branch during LLM generation based on token-level uncertainty, aiming to reduce error propagation and unnecessary exploration.
  • Instead of uniformly applying sampling or self-consistency rollouts, it maintains a dynamic pool of partial rollouts and expands it primarily at high-entropy (vulnerable) positions.
  • To lower overhead, the method uses a rollout-level “Entropy After </Think> (EAT)” stopping criterion, evaluating entropy after the full reasoning trace rather than at every intermediate step.
  • Experiments on GSM8K, AMC2023, and perturbed variants show consistently strong accuracy, including results that are comparable to GPT-5 on smaller models while requiring a fraction of the cost.

Abstract

Decoding strategies play a central role in shaping the reasoning ability of large language models (LLMs). Traditional methods such as greedy decoding and beam search often suffer from error propagation, while sampling-based approaches introduce randomness without adequate robustness. Self-consistency improves reliability by aggregating multiple rollouts, but incurs significant computational overhead. We propose an entropy-guided decoding framework that introduces token-level adaptivity into generation. At each step, the model computes the entropy of the token distribution, identifies high-uncertainty positions, and selectively branches on these vulnerable points. A dynamic pool of partial rollouts is maintained and expanded until solutions are completed, concentrating computation where uncertainty is greatest and avoiding unnecessary exploration in confident regions. To enable efficient termination, we apply a rollout-level Entropy After (EAT) stopping criterion by performing entropy evaluation after the full reasoning trace, rather than incrementally at every step. Experiments on GSM8K, AMC2023, and their perturbed variants demonstrate that our method achieves consistently strong accuracy. Notably, on smaller LLMs, performance is comparable to GPT-5 while operating at a fraction of the cost.