The Detection--Extraction Gap: Models Know the Answer Before They Can Say It

arXiv cs.CL / 4/9/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper finds a “detection–extraction gap,” where reasoning models generate substantial chain-of-thought after the correct answer is already recoverable from an early prefix (52–88% of CoT tokens are produced post-commitment).
  • It shows that free continuation decoding can recover the correct answer from as little as 10% of the trace, while forced extraction fails in 42% of those cases, implying the model state contains the answer but decoding choices prevent retrieval.
  • The authors formalize the mismatch by bounding the total-variation distance between free vs. forced continuation distributions, quantifying how the suffix induces a shift.
  • To address the gap, the paper proposes Black-box Adaptive Early Exit (BAEE), using free continuations for both detection and extraction to truncate 70–78% of serial generation and improve accuracy by 1–5 percentage points across tested models and benchmarks.
  • For “thinking-mode” models, early exit avoids post-commitment overwriting with gains up to 5.8pp, and a cost-optimized variant reduces API calls (68–73% reduction) at a median of nine calls; code is released on GitHub.

Abstract

Modern reasoning models continue generating long after the answer is already determined. Across five model configurations, two families, and three benchmarks, we find that \textbf{52--88\% of chain-of-thought tokens are produced after the answer is recoverable} from a partial prefix. This post-commitment generation reveals a structural phenomenon: the \textbf{detection--extraction gap}. Free continuations from early prefixes recover the correct answer even at 10\% of the trace, while forced extraction fails on 42\% of these cases. The answer is recoverable from the model state, yet prompt-conditioned decoding fails to extract it. We formalize this mismatch via a total-variation bound between free and forced continuation distributions, yielding quantitative estimates of suffix-induced shift. Exploiting this asymmetry, we propose Black-box Adaptive Early Exit (\BAEE{}), which uses free continuations for both detection and extraction, truncating \textbf{70--78\% of serial generation} while \textbf{improving accuracy by 1--5\,pp} across all models. For thinking-mode models, early exit prevents post-commitment overwriting, yielding gains of up to 5.8\,pp; a cost-optimized variant achieves 68--73\% reduction at a median of 9 API calls. Code is available at https://github.com/EdWangLoDaSc/know2say.