Do Large Language Models Possess a Theory of Mind? A Comparative Evaluation Using the Strange Stories Paradigm
arXiv cs.AI / 3/20/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper evaluates whether current Large Language Models have Theory of Mind by using an adapted Strange Stories paradigm to test beliefs, intentions, and emotions of story characters.
- The study tested five LLMs and compared them to human controls, finding a performance gap for earlier and smaller models while GPT-4o showed high accuracy and robustness comparable to humans in challenging conditions.
- GPT-4o's performance suggests some capacity for mental-state attribution in advanced LLMs, but results do not settle whether this reflects genuine understanding or pattern completion.
- The authors discuss the implications for the cognitive status of LLMs and emphasize the boundary between genuine understanding and statistical approximation in language models.
Related Articles

Check out this article on AI-Driven Reporting 2.0: From Manual Bottlenecks to Real-Time Decision Intelligence (2026 Edition)
Dev.to

SYNCAI
Dev.to
How AI-Powered Decision Making is Reshaping Enterprise Strategy in 2024
Dev.to
When AI Grows Up: Identity, Memory, and What Persists Across Versions
Dev.to
AI-Driven Reporting 2.0: From Manual Bottlenecks to Real-Time Decision Intelligence (2026 Edition)
Dev.to