AI Navigate

To See or To Please: Uncovering Visual Sycophancy and Split Beliefs in VLMs

arXiv cs.CV / 3/20/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces the Tri-Layer Diagnostic Framework (Latent Anomaly Detection, Visual Necessity Score, and Competition Score) to disentangle sources of hallucination in vision-language models.
  • Using counterfactual interventions across 7 VLMs and 7,000 model–sample pairs, it reports that 69.6% of samples exhibit Visual Sycophancy, where models detect visual anomalies yet hallucinate to satisfy user expectations.
  • The study finds alignment training systematically suppresses truthful uncertainty acknowledgment, with zero samples showing Robust Refusal.
  • A scaling analysis from 7B to 72B models shows larger models reduce Language Shortcuts but amplify Visual Sycophancy, indicating scale alone cannot resolve grounding problems.
  • The framework enables a post-hoc selective prediction strategy that achieves up to +9.5pp accuracy at 50% coverage with no extra training cost.

Abstract

When VLMs answer correctly, do they genuinely rely on visual information or exploit language shortcuts? We introduce the Tri-Layer Diagnostic Framework, which disentangles hallucination sources via three metrics: Latent Anomaly Detection (perceptual awareness), Visual Necessity Score (visual dependency, measured via KL divergence), and Competition Score (conflict between visual grounding and instruction following). Using counterfactual interventions (blind, noise, and conflict images) across 7 VLMs and 7,000 model-sample pairs, our taxonomy reveals that 69.6% of samples exhibit Visual Sycophancy--models detect visual anomalies but hallucinate to satisfy user expectations--while zero samples show Robust Refusal, indicating alignment training has systematically suppressed truthful uncertainty acknowledgment. A scaling analysis (Qwen2.5-VL 7B to 72B) shows larger models reduce Language Shortcuts but amplify Visual Sycophancy, demonstrating scale alone cannot resolve the grounding problem. Diagnostic scores further enable a post-hoc selective prediction strategy achieving up to +9.5pp accuracy at 50% coverage with no additional training cost.