Do Large Language Models Possess a Theory of Mind? A Comparative Evaluation Using the Strange Stories Paradigm
arXiv cs.AI / 3/20/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper evaluates whether current Large Language Models have Theory of Mind by using an adapted Strange Stories paradigm to test beliefs, intentions, and emotions of story characters.
- The study tested five LLMs and compared them to human controls, finding a performance gap for earlier and smaller models while GPT-4o showed high accuracy and robustness comparable to humans in challenging conditions.
- GPT-4o's performance suggests some capacity for mental-state attribution in advanced LLMs, but results do not settle whether this reflects genuine understanding or pattern completion.
- The authors discuss the implications for the cognitive status of LLMs and emphasize the boundary between genuine understanding and statistical approximation in language models.
Related Articles
I Was Wrong About AI Coding Assistants. Here's What Changed My Mind (and What I Built About It).
Dev.to

Interesting loop
Reddit r/LocalLLaMA
Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants
Reddit r/LocalLLaMA
A supervisor or "manager" Al agent is the wrong way to control Al
Reddit r/artificial
FeatherOps: Fast fp8 matmul on RDNA3 without native fp8
Reddit r/LocalLLaMA