When Contextual Inference Fails: Cancelability in Interactive Instruction Following
arXiv cs.CL / 3/23/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces Build What I Mean (BWIM), an interactive benchmark for contextual meaning construction in guided instruction following.
- BWIM extends a two-speaker psycholinguistic paradigm to compare contextual inference versus literal adherence under a small communication cost.
- Evaluations of state-of-the-art LLMs show a dissociation between judgment and action: models can detect speaker unreliability in confidence judgments but do not consistently use that to request clarifications.
- Consequently, models exhibit suboptimal strategies, such as partner-blind over-clarification and question-averse guessing under uncertainty, highlighting a gap between understanding and actionable behavior.
Related Articles
How CVE-2026-25253 exposed every OpenClaw user to RCE — and how to fix it in one command
Dev.to
Does Synthetic Data Generation of LLMs Help Clinical Text Mining?
Dev.to
What CVE-2026-25253 Taught Me About Building Safe AI Assistants
Dev.to
Day 52: Building vs Shipping — Why We Had 711 Commits and 0 Users
Dev.to
The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX
Dev.to