CoMMET: To What Extent Can LLMs Perform Theory of Mind Tasks?
arXiv cs.CL / 3/13/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- CoMMET is a new multimodal benchmark dataset designed to evaluate Theory of Mind in LLMs, extending assessment beyond text inputs.
- It introduces multi-turn testing and is inspired by the Theory of Mind Booklet Task, reportedly the first multimodal ToM benchmark of its kind.
- The study evaluates multiple LLM families and sizes to analyze strengths and limitations and to identify directions for future improvement.
- By probing social cognitive abilities, CoMMET aims to enable more natural and effective human-AI interactions.
- This release provides a new resource for the AI research community to benchmark ToM-related performance across modalities and conversational turns.
Related Articles
I Was Wrong About AI Coding Assistants. Here's What Changed My Mind (and What I Built About It).
Dev.to

Interesting loop
Reddit r/LocalLLaMA
Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants
Reddit r/LocalLLaMA
A supervisor or "manager" Al agent is the wrong way to control Al
Reddit r/artificial
FeatherOps: Fast fp8 matmul on RDNA3 without native fp8
Reddit r/LocalLLaMA