SMSP: A Plug-and-Play Strategy of Multi-Scale Perception for MLLMs to Perceive Visual Illusions
arXiv cs.CV / 3/25/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper reports that multimodal large language models (MLLMs) can fail on hidden-pattern visual illusions that are obvious to humans, indicating a perceptual misalignment with human vision and raising safety concerns.
- It introduces IlluChar, a comprehensive illusion dataset, and identifies a key failure mechanism: high-frequency attention bias that makes models get distracted by textured backgrounds and miss the hidden content.
- To mitigate this, the authors propose SMSP (Strategy of Multi-Scale Perception), a plug-and-play framework that suppresses distracting high-frequency background information to better align model perception with humans.
- Experiments show SMSP substantially boosts performance across evaluated MLLMs on illusion images, including a large jump in Qwen3-VL-8B-Instruct accuracy from 13.0% to 84.0%.
- The authors make the code publicly available, positioning SMSP as a practical and robust approach for improving MLLM visual perception.
Related Articles
5 Signs Your Consulting Firm Needs AI Agents (Not More Staff)
Dev.to
AgentDesk vs Hiring Another Consultant: A Cost Comparison
Dev.to
"Why Your AI Agent Needs a System 1"
Dev.to
When should we expect TurboQuant?
Reddit r/LocalLLaMA
AI as Your Customs Co-Pilot: Automating HS Code Chaos in Southeast Asia
Dev.to