Beyond Perception Errors: Semantic Fixation in Large Vision-Language Models

arXiv cs.CV / 4/15/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper identifies a “semantic fixation” phenomenon in large vision-language models where the model keeps a default interpretation even when the prompt specifies an alternative but valid rule mapping.
  • To disentangle perception failures from rule-mapping failures, the authors introduce the VLM-Fix benchmark using paired inverse/standard formulations across four abstract strategy games with identical terminal states.
  • Experiments across 14 open and closed VLMs show a consistent accuracy advantage for the standard rules, demonstrating a robust semantic-fixation gap.
  • Prompt aliasing can modulate this behavior: neutral alias prompts reduce the inverse-rule performance gap, while semantically loaded aliases restore it, implying the mechanism is controllable through prompt semantics.
  • The study also reports that rule-focused post-training improves same-rule transfer but degrades opposite-rule transfer, joint-rule training improves broader transfer, and late-layer activation steering can partially recover performance; similar patterns appear in a more external VLMBias evaluation.

Abstract

Large vision-language models (VLMs) often rely on familiar semantic priors, but existing evaluations do not cleanly separate perception failures from rule-mapping failures. We study this behavior as semantic fixation: preserving a default interpretation even when the prompt specifies an alternative, equally valid mapping. To isolate this effect, we introduce VLM-Fix, a controlled benchmark over four abstract strategy games that evaluates identical terminal board states under paired standard and inverse rule formulations. Across 14 open and closed VLMs, accuracy consistently favors standard rules, revealing a robust semantic-fixation gap. Prompt interventions support this mechanism: neutral alias prompts substantially narrow the inverse-rule gap, while semantically loaded aliases reopen it. Post-training is strongly rule-aligned: training on one rule improves same-rule transfer but hurts opposite-rule transfer, while joint-rule training improves broader transfer. To test external validity beyond synthetic games, we evaluate analogous defamiliarization interventions on VLMBias and observe the same qualitative pattern. Finally, late-layer activation steering partially recovers degraded performance, indicating that semantic-fixation errors are at least partly editable in late representations. Project page, code, and dataset available at https://maveryn.github.io/vlm-fix/.