Spatiotemporal Hidden-State Dynamics as a Signature of Internal Reasoning in Large Language Models

arXiv cs.CL / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper tests whether a large language model’s extended “reasoning” traces reflect real internal computation versus mere verbosity, arguing that coarse hidden-state analyses miss token- and layer-level structure.
It finds that successful reasoning trajectories show a characteristic spatiotemporal hidden-state pattern: broad temporal dynamics coupled with localized concentration across layers, which is weaker in non-reasoning models and in knowledge-heavy settings.
The authors formalize this as StALT (Spatiotemporal Amplitude of Latent Transition), a training-free statistic computed from hidden-state transitions between adjacent tokens weighted by per-token layer saliency.
Across multiple models and benchmarks, StALT can reliably distinguish correct from incorrect trajectories in reasoning-heavy regimes and works as a label-free correctness signal that competes with output-space and length-based baselines.
Intervention experiments indicate StALT changes systematically when the demand for internal reasoning is increased or reduced, providing evidence that it is tied to latent reasoning dynamics in LLMs.

Abstract

Large reasoning models (LRMs) generate extended solutions, yet it remains unclear whether these traces reflect substantive internal computation or merely verbosity and overthinking. Although recent hidden-state analyses suggest that internal representations carry correctness-related signals, their coarse aggregations may obscure the token and layer structure underlying reasoning computation. We investigate hidden-state transitions across decoding steps and layers, and identify a distinct spatiotemporal pattern in LRMs: successful trajectories exhibit broad temporal dynamics with localized layer-wise concentration, while this structure is weaker in non-reasoning models and knowledge-heavy domains. We formalize this characteristic as Spatiotemporal Amplitude of Latent Transition (StALT), a training-free trajectory statistic that summarizes temporal changes between adjacent tokens weighted by within-token layer saliency. Across diverse models and benchmarks, StALT reliably separates correct from incorrect trajectories in reasoning-intensive regimes, providing a competitive label-free correctness signal alongside strong output-space and length-based baselines. Intervention analyses further show that this spatiotemporal amplitude responds systematically to manipulations that increase or reduce the demand for internal reasoning, supporting its association with latent reasoning dynamics in LRMs. These findings provide empirical evidence that LRMs exhibit measurable hidden-state dynamics and offer a practical probe for understanding internal computation beyond output-based evaluation.