CoMMET: To What Extent Can LLMs Perform Theory of Mind Tasks?
arXiv cs.CL / 3/13/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- CoMMET is a new multimodal benchmark dataset designed to evaluate Theory of Mind in LLMs, extending assessment beyond text inputs.
- It introduces multi-turn testing and is inspired by the Theory of Mind Booklet Task, reportedly the first multimodal ToM benchmark of its kind.
- The study evaluates multiple LLM families and sizes to analyze strengths and limitations and to identify directions for future improvement.
- By probing social cognitive abilities, CoMMET aims to enable more natural and effective human-AI interactions.
- This release provides a new resource for the AI research community to benchmark ToM-related performance across modalities and conversational turns.
Related Articles
Astral to Join OpenAI
Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA

Why Data is Important for LLM
Dev.to
The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.
Dev.to
YouTube's Deepfake Shield for Politicians Changes Evidence Forever
Dev.to