SPREG: Structured Plan Repair with Entropy-Guided Test-Time Intervention for Large Language Model Reasoning

arXiv cs.AI / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces SPREG, a lightweight inference-time framework designed to fix logical hallucinations and entropy-driven drift in large language model (LLM) long-chain reasoning.
  • SPREG detects failure by monitoring real-time entropy and identifying “entropy spikes,” which serve as signals that reasoning has gone off track.
  • When an entropy spike is detected, SPREG performs surgical repair by replacing uninformative null-priors with reference distributions derived from historical high-confidence states.
  • The method adapts classifier-free guidance strength across structured reasoning stages (such as Action and Observation) to regain stability while preserving language fluency.
  • Experiments report a notable 20.0% absolute accuracy improvement on AIME25 and strong suppression of uncontrolled entropy drift in complex tasks.

Abstract

Large Language Models (LLMs) are prone to logical hallucinations and stochastic drifts during long-chain reasoning. While Classifier-Free Guidance (CFG) can improve instruction adherence, standard static implementations often cause semantic dilution and linguistic degradation. We propose SPREG (Structured Plan-guided Real-time Entropy Gating), a lightweight inference-time framework for surgical error rectification. SPREG employs an adaptive dual-threshold mechanism to monitor real-time entropy, identifying sudden ``entropy spikes'' as reliable indicators of logical failure. Upon detection, it triggers a dynamic repair by replacing uninformative null-priors with reference distributions synthesized from historical high-confidence states. By modulating guidance intensity according to structured reasoning stages (e.g., Action, Observation), SPREG steers the model back to a stable manifold without compromising fluency. Our experiments demonstrate significant gains, notably a 20.0% absolute accuracy improvement on AIME25, while effectively suppressing uncontrolled entropy drift in complex tasks.