When Chain-of-Thought Fails, the Solution Hides in the Hidden States

arXiv cs.CL / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The study examines whether chain-of-thought (CoT) intermediate tokens are computationally useful by testing if token-level hidden states contain task-relevant information.
  • Using mechanistic causal analysis with activation patching on GSM8K, the researchers transfer hidden states from a CoT run into a direct-answer run and find that patched generation can significantly outperform both direct prompting and the original (possibly incorrect) CoT trace.
  • Task-relevant information in CoT appears more often in correct than incorrect runs, is unevenly distributed across tokens, and concentrates in mid-to-late transformer layers, often showing up earlier in the reasoning.
  • The paper finds that linguistic tokens (e.g., verbs and entities) are more likely to steer reasoning toward correctness, while mathematical tokens tend to encode answer-proximal details that are less effective for recovery.
  • Patched outputs are frequently shorter than full CoT chains yet achieve higher accuracy, implying that complete step-by-step reasoning traces may not always be required to solve the problem.

Abstract

Whether intermediate reasoning is computationally useful or merely explanatory depends on whether chain-of-thought (CoT) tokens contain task-relevant information. We present a mechanistic causal analysis of CoT on GSM8K using activation patching: transferring token-level hidden states from a CoT generation to a direct-answer run for the same question, then measuring the effect on final-answer accuracy. Across models, generating after patching yields substantially higher accuracy than both direct-answer prompting and the original CoT trace, revealing that individual CoT tokens can encode sufficient information to recover the correct answer, even when the original trace is incorrect. This task-relevant information is more prevalent in correct than incorrect CoT runs and is unevenly distributed across tokens, concentrating in mid-to-late layers and appearing earlier in the reasoning trace. Moreover, patching language tokens such as verbs and entities carry task-solving information that steers generation toward correct reasoning, whereas mathematical tokens encode answer-proximal content that rarely succeeds. Patched outputs are often shorter and yet exceed the accuracy of a full CoT trace, suggesting complete reasoning chains are not always necessary. Together, these findings demonstrate that CoT encodes recoverable, token-level problem-solving information, offering new insight into how reasoning is represented and where it breaks down.