The Reasoning Trap: An Information-Theoretic Bound on Closed-System Multi-Step LLM Reasoning

arXiv cs.CL / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that closed-system multi-step LLM reasoning—especially multi-agent debate where agents iteratively refine each other’s outputs—can preserve answer accuracy while degrading the underlying faithfulness of the reasoning.
It introduces a framework with SFS (Supported Faithfulness Score) to test atomic claims against provided evidence, and reports decomposer-invariant rankings (Spearman rho = 1.0).
It proposes EGSR (Evidence-Grounded Socratic Reasoning) to replace adversarial debate with evidence-based inquiry, and claims it can recover reasoning faithfulness.
The core theory is a Data Processing Inequality (DPI) bound (Theorem 1) showing that mutual information between evidence E and later outputs O^{t+1} cannot increase under the assumed Markov chain, formalizing the “Reasoning Trap.”
Experiments on SciFact and FEVER show DebateCV keeps 88% baseline accuracy but SFS drops sharply (and vote-based MAD collapses SFS), while EGSR reportedly recovers 98%, with an additional study suggesting human calibration of faithfulness metrics may be unstable across languages/domains.

Abstract

When copies of the same language model are prompted to debate, they produce diverse phrasings of one perspective rather than diverse perspectives. Multi-agent debate (MAD), and more broadly closed-system reasoning where agents iteratively transform each other's outputs, tends to preserve answer accuracy while degrading the reasoning behind those answers. We name the multi-agent case the Debate Trap and the broader phenomenon the Reasoning Trap, offering a programmatic theory of evidence-grounded reasoning failure.The framework has three parts: (i) SFS (Supported Faithfulness Score), a claim-level metric verifying decomposed atomic claims against provided evidence (decomposer-invariant rankings: Spearman rho=1.0); (ii) EGSR (Evidence-Grounded Socratic Reasoning), replacing adversarial argumentation with evidence-grounded inquiry; (iii) Theorem 1 (DPI Bound): under standard MAD, the chain E -> O^0 -> O^1 -> ... is Markov, and the Data Processing Inequality implies E[I(E;O^{t+1})] <= E[I(E;O^t)]. Three companion results -- open-system recovery (Theorem 2), EGSR accumulation (Lemma 2), and vote-aggregation floor (Proposition 1) -- partition multi-step LLM reasoning by its information-theoretic relationship to E. Across 16 conditions on SciFact (300 claims) and FEVER (1,000 claims), DebateCV (C13) preserves 88% of baseline accuracy while SFS drops 43%; majority-vote MAD (C15) reduces SFS to 1.7% of baseline (p < 10^{-6}, d = -0.96); EGSR recovers 98%. An R6 cohort study (Korean n=10x30 FEVER; English n=3x200 SciFact) finds inter-rater Fleiss kappa <= +0.018 with 0.8-1.4 Likert intra-rater shifts across language and domain -- the human agreement that faithfulness metrics have been calibrated against is not itself stable. We offer one falsifiable conjecture: any closed-system reasoning protocol preserving Theorem 1's Markov structure is, in expectation, subject to the same DPI bound.