Theory of Mind in Action: The Instruction Inference Task in Dynamic Human-Agent Collaboration
arXiv cs.CL / 4/20/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies how large language model (LLM) agents can infer a human principal’s unspoken intentions when instructions are incomplete or ambiguous, treating this as a “Theory of Mind” (ToM) capability.
- It introduces a new evaluation benchmark/task called “Instruction Inference,” designed to test ToM in dynamic, goal-oriented human-agent collaboration.
- The authors propose “Tomcat,” an LLM-based agent with two variants: Fs-CoT (few-shot structured chain-of-thought examples) and CP (commonsense-prompt-based reasoning).
- Tomcat is implemented on GPT-4o, DeepSeek-R1, and Gemma-3-27B, and is evaluated via a user study with 52 participants using the same information as the CP variant.
- Results show that Tomcat using Fs-CoT—especially with GPT-4o and DeepSeek-R1—achieves performance comparable to human participants on intent accuracy, action optimality, and planning optimality, suggesting strong ToM potential for teaming.
Related Articles

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)
Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI
Dev.to

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else
Dev.to
Local LLM Beginner’s Guide (Mac - Apple Silicon)
Reddit r/artificial

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals
Dev.to