DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories
arXiv cs.CL / 4/23/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces DialToM, a human-verified benchmark for evaluating how well LLMs perform Theory of Mind (ToM) in state-driven dialogue forecasting.
- It assesses both Literal ToM (predicting mental states) and Functional ToM (using those states to select state-consistent dialogue trajectories) via Prospective Diagnostic Forecasting.
- Results show an asymmetry in reasoning: models can identify mental states well, but most fail to use that understanding to forecast social dialogue trajectories, with Gemini 3 Pro as a notable exception.
- The study finds only weak semantic alignment between human and LLM-generated inferences.
- The authors release the DialToM dataset and evaluation code publicly to support reproducibility.
Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans
Dev.to

10 AI Tools Every Developer Should Try in 2026
Dev.to

Why use an AI gateway at all?
Dev.to

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago
Dev.to