Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation
arXiv cs.CV / 4/3/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper reports that multimodal LLMs can exhibit “visual attention inertia,” where visual attention stays largely static after early decoding steps and does not support the compositional reasoning needed for cognitive inference.
- It argues that many existing hallucination mitigation approaches focus on perceptual hallucinations (e.g., whether an object exists or its attributes) and do not adequately address cognitive hallucinations requiring relational deduction between objects.
- Using token-wise attention analysis, the authors identify visual inertia—persistently focused attention on semantically critical regions—as a key driver of this failure to perform inter-object relational inference.
- They propose a training-free Inertia-aware Visual Excitation (IVE) method that dynamically selects emerging visual tokens and applies an inertia-aware penalty to reduce over-concentration and attention persistence in localized regions.
- Experimental results indicate that IVE improves mitigation of cognitive hallucinations across multiple base MLLMs and several hallucination benchmarks.
Related Articles

Black Hat Asia
AI Business

Mistral raises $830M, 9fin hits unicorn status, and new Tech.eu Summit speakers unveiled
Tech.eu

ChatGPT costs $20/month. I built an alternative for $2.99.
Dev.to

OpenAI shifts to usage-based pricing for Codex in ChatGPT business plans
THE DECODER

Why I built an AI assistant that doesn't know who you are
Dev.to