LLM Reasoning Is Latent, Not the Chain of Thought

arXiv cs.AI / 4/20/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that LLM “reasoning” should be studied as the formation of latent-state trajectories rather than as a faithful, observable chain-of-thought (CoT) on the surface.
It explains that many conclusions—such as faithfulness, interpretability, reasoning benchmarks, and intervention-at-inference—depend on what researchers assume the primary unit of reasoning is.
The authors disentangle three frequently confounded factors and formalize competing hypotheses: reasoning via latent trajectories (H1), reasoning via explicit surface CoT (H2), or reasoning gains driven mainly by generic serial compute (H0).
By reorganizing prior empirical/mechanistic/survey evidence and adding compute-audited examples that separate surface traces from latent interventions and budget increases, the paper finds current evidence most strongly supports H1 as a default hypothesis.
The paper recommends that the field adopt latent-state dynamics as the default object of study and evaluate reasoning using experimental designs that explicitly disentangle surface traces, latent states, and serial compute.

Abstract

This position paper argues that large language model (LLM) reasoning should be studied as latent-state trajectory formation rather than as faithful surface chain-of-thought (CoT). This matters because claims about faithfulness, interpretability, reasoning benchmarks, and inference-time intervention all depend on what the field takes the primary object of reasoning to be. We ask what that object should be once three often-confounded factors are separated and formalize three competing hypotheses: H1, reasoning is primarily mediated by latent-state trajectories; H2, reasoning is primarily mediated by explicit surface CoT; and H0, most apparent reasoning gains are better explained by generic serial compute than by any privileged representational object. Reorganizing recent empirical, mechanistic, and survey work under this framework, and adding compute-audited worked exemplars that factorize surface traces, latent interventions, and matched budget expansions, we find that current evidence most strongly supports H1 as a default working hypothesis rather than as a task-independent verdict. We therefore make two recommendations: the field should treat latent-state dynamics as the default object of study for LLM reasoning, and it should evaluate reasoning with designs that explicitly disentangle surface traces, latent states, and serial compute.

Which Version of Qwen 3.6 for M5 Pro 24g

Reddit r/LocalLLaMA

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Dev.to

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else

Dev.to

Local LLM Beginner’s Guide (Mac - Apple Silicon)

Reddit r/artificial

LLM Reasoning Is Latent, Not the Chain of Thought

Key Points

Abstract

Related Articles

Which Version of Qwen 3.6 for M5 Pro 24g

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else

Local LLM Beginner’s Guide (Mac - Apple Silicon)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer