From Human Cognition to Neural Activations: Probing the Computational Primitives of Spatial Reasoning in LLMs

arXiv cs.AI / 3/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper investigates whether LLM spatial-reasoning benchmark performance comes from structured internal spatial representations or from language-based heuristics by using mechanistic analysis tools.
It decomposes spatial reasoning into three computational primitives—relational composition, representational transformation, and stateful spatial updating—and evaluates controlled task families targeting each primitive.
Using multilingual single-pass inference (English, Chinese, Arabic) plus linear probing, sparse autoencoder feature analysis, and causal interventions, the authors find spatial-relevant information appears in intermediate layers and can causally affect outputs.
However, these internal spatial representations are described as transient, fragmented across task families, and only weakly integrated into final predictions, indicating limited robustness.
Cross-lingual experiments reveal “mechanistic degeneracy,” where similar behavioral performance can be produced by different internal pathways, suggesting reliance varies with context and language.
Point 6

Abstract

As spatial intelligence becomes an increasingly important capability for foundation models, it remains unclear whether large language models' (LLMs) performance on spatial reasoning benchmarks reflects structured internal spatial representations or reliance on linguistic heuristics. We address this question from a mechanistic perspective by examining how spatial information is internally represented and used. Drawing on computational theories of human spatial cognition, we decompose spatial reasoning into three primitives, relational composition, representational transformation, and stateful spatial updating, and design controlled task families for each. We evaluate multilingual LLMs in English, Chinese, and Arabic under single pass inference, and analyze internal representations using linear probing, sparse autoencoder based feature analysis, and causal interventions. We find that task relevant spatial information is encoded in intermediate layers and can causally influence behavior, but these representations are transient, fragmented across task families, and weakly integrated into final predictions. Cross linguistic analysis further reveals mechanistic degeneracy, where similar behavioral performance arises from distinct internal pathways. Overall, our results suggest that current LLMs exhibit limited and context dependent spatial representations rather than robust, general purpose spatial reasoning, highlighting the need for mechanistic evaluation beyond benchmark accuracy.