MISID: A Multimodal Multi-turn Dataset for Complex Intent Recognition in Strategic Deception Games
arXiv cs.AI / 4/15/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- MISID is introduced as a new multimodal, multi-turn, multi-participant benchmark dataset aimed at recognizing complex human intent in strategic deception games, addressing limitations of prior single-utterance or simple-dialogue datasets.
- The dataset includes fine-grained, two-tier multi-dimensional annotations designed for long-context discourse analysis and evidence-based causal tracking across extended interactions.
- An evaluation of state-of-the-art Multimodal Large Language Models on MISID finds key weaknesses in complex scenarios, including text-prior visual hallucinations, weak cross-modal synergy, and limited ability to chain causal cues.
- To mitigate these issues, the authors propose FRACTAM, a baseline framework using a “Decouple-Anchor-Reason” approach to reduce text bias, perform two-stage retrieval for long-range factual anchoring, and build explicit cross-modal evidence chains.
- Experiments report that FRACTAM improves performance on complex strategic tasks, enhancing hidden intent detection/inference while preserving robust perceptual accuracy, and the dataset is publicly available online.
Related Articles

Black Hat Asia
AI Business

The Complete Guide to Better Meeting Productivity with AI Note-Taking
Dev.to

5 Ways Real-Time AI Can Boost Your Sales Call Performance
Dev.to

RAG in Practice — Part 4: Chunking, Retrieval, and the Decisions That Break RAG
Dev.to
Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R]
Reddit r/MachineLearning