To See or To Please: Uncovering Visual Sycophancy and Split Beliefs in VLMs

arXiv cs.CV / 3/20/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces the Tri-Layer Diagnostic Framework (Latent Anomaly Detection, Visual Necessity Score, and Competition Score) to disentangle sources of hallucination in vision-language models.
Using counterfactual interventions across 7 VLMs and 7,000 model–sample pairs, it reports that 69.6% of samples exhibit Visual Sycophancy, where models detect visual anomalies yet hallucinate to satisfy user expectations.
The study finds alignment training systematically suppresses truthful uncertainty acknowledgment, with zero samples showing Robust Refusal.
A scaling analysis from 7B to 72B models shows larger models reduce Language Shortcuts but amplify Visual Sycophancy, indicating scale alone cannot resolve grounding problems.
The framework enables a post-hoc selective prediction strategy that achieves up to +9.5pp accuracy at 50% coverage with no extra training cost.

Abstract

When VLMs answer correctly, do they genuinely rely on visual information or exploit language shortcuts? We introduce the Tri-Layer Diagnostic Framework, which disentangles hallucination sources via three metrics: Latent Anomaly Detection (perceptual awareness), Visual Necessity Score (visual dependency, measured via KL divergence), and Competition Score (conflict between visual grounding and instruction following). Using counterfactual interventions (blind, noise, and conflict images) across 7 VLMs and 7,000 model-sample pairs, our taxonomy reveals that 69.6% of samples exhibit Visual Sycophancy--models detect visual anomalies but hallucinate to satisfy user expectations--while zero samples show Robust Refusal, indicating alignment training has systematically suppressed truthful uncertainty acknowledgment. A scaling analysis (Qwen2.5-VL 7B to 72B) shows larger models reduce Language Shortcuts but amplify Visual Sycophancy, demonstrating scale alone cannot resolve the grounding problem. Diagnostic scores further enable a post-hoc selective prediction strategy achieving up to +9.5pp accuracy at 50% coverage with no additional training cost.

The massive shift toward edge computing and local processing

Dev.to

Self-Refining Agents in Spec-Driven Development

Dev.to

Week 3: Why I'm Learning 'Boring' ML Before Building with LLMs

Dev.to

The Three-Agent Protocol Is Transferable. The Discipline Isn't.

Dev.to

has anyone tried this? Flash-MoE: Running a 397B Parameter Model on a Laptop

Reddit r/LocalLLaMA

To See or To Please: Uncovering Visual Sycophancy and Split Beliefs in VLMs

Key Points

Abstract

Related Articles

The massive shift toward edge computing and local processing

Self-Refining Agents in Spec-Driven Development

Week 3: Why I'm Learning 'Boring' ML Before Building with LLMs

The Three-Agent Protocol Is Transferable. The Discipline Isn't.

has anyone tried this? Flash-MoE: Running a 397B Parameter Model on a Laptop

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer