Beyond Perception Errors: Semantic Fixation in Large Vision-Language Models
arXiv cs.CV / 4/15/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper identifies a “semantic fixation” phenomenon in large vision-language models where the model keeps a default interpretation even when the prompt specifies an alternative but valid rule mapping.
- To disentangle perception failures from rule-mapping failures, the authors introduce the VLM-Fix benchmark using paired inverse/standard formulations across four abstract strategy games with identical terminal states.
- Experiments across 14 open and closed VLMs show a consistent accuracy advantage for the standard rules, demonstrating a robust semantic-fixation gap.
- Prompt aliasing can modulate this behavior: neutral alias prompts reduce the inverse-rule performance gap, while semantically loaded aliases restore it, implying the mechanism is controllable through prompt semantics.
- The study also reports that rule-focused post-training improves same-rule transfer but degrades opposite-rule transfer, joint-rule training improves broader transfer, and late-layer activation steering can partially recover performance; similar patterns appear in a more external VLMBias evaluation.
Related Articles

Black Hat Asia
AI Business
Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]
Reddit r/MachineLearning

I built a trading intelligence MCP server in 2 days — here's how
Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
Qwen3.5-35B running well on RTX4060 Ti 16GB at 60 tok/s
Reddit r/LocalLLaMA