Do LLMs Build Spatial World Models? Evidence from Grid-World Maze Tasks
arXiv cs.AI / 4/14/2026
💬 OpinionModels & Research
Key Points
- The study tests whether LLMs can build internal spatial world models by using controlled grid-world maze tasks that require multi-step planning and spatial abstraction.
- Results across Gemini-2.5-Flash, GPT-5-mini, Claude-Haiku-4.5, and DeepSeek-Chat show large failures in spatial reasoning, with performance dropping sharply when switching from tokenized adjacency representations (80–86% on small grids) to visual grid formats (16–34%).
- Follow-up probes using sequential proximity and compositional distance questions find that high semantic coverage in reasoning traces (96–99%) does not translate into reliable spatial computations, implying the models do not accumulate spatial knowledge.
- The authors conclude that LLM spatial reasoning is representation- and prompting-dependent, succeeding only in narrow conditions rather than forming robust, format-invariant spatial world models.
- The findings raise concerns for deploying foundation models in applications that rely on consistent spatial abstraction for planning and reasoning.
Related Articles

Don't forget, there is more than forgetting: new metrics for Continual Learning
Dev.to

Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale
Dev.to
LLM Guard scored 0/8 detecting a Crescendo multi-turn attack. Arc Sentry flagged it at Turn 3.
Reddit r/artificial
My first impressions of Minimax M2.7 (Q5_K_M) vs Qwen 3.5 27b (Q8_0)
Reddit r/LocalLLaMA
Trusted access for the next era of cyber defense
Simon Willison's Blog