VCE: A zero-cost hallucination mitigation method of LVLMs via visual contrastive editing

arXiv cs.CL / 4/22/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • Large vision-language models often produce object hallucinations—describing objects not present in the input—which is especially risky in domains like medical imaging and autonomous driving.
  • The paper argues that hallucinations are driven in part by language priors learned during pretraining, which bias the model toward statistically likely words.
  • It proposes Visual Contrastive Editing (VCE), a label-free, post-hoc method that uses contrastive visual perturbations to detect and suppress hallucination tendencies.
  • VCE applies targeted parameter edits based on SVD-based decomposition to isolate hallucination-relevant subspaces, avoiding the need for fine-tuning or labeled data.
  • Experiments show VCE reduces object hallucination on multiple benchmarks while preserving the model’s original computational efficiency.

Abstract

Large vision-language models (LVLMs) frequently suffer from Object Hallucination (OH), wherein they generate descriptions containing objects that are not actually present in the input image. This phenomenon is particularly problematic in real-world applications such as medical imaging and autonomous driving, where accuracy is critical. Recent studies suggest that the hallucination problem may stem from language priors: biases learned during pretraining that cause LVLMs to generate words based on their statistical co-occurrence. To mitigate this problem, we propose Visual Contrastive Editing (VCE), a novel post-hoc method that identifies and suppresses hallucinatory tendencies by analyzing the model's response to contrastive visual perturbations. Using Singular Value Decomposition (SVD), we decompose the model's activation patterns to isolate hallucination subspaces and apply targeted parameter edits to attenuate its influence. Unlike existing approaches that require fine-tuning or labeled data, VCE operates as a label-free intervention, making it both scalable and practical for deployment in resource-constrained settings. Experimental results demonstrate that VCE effectively reduces object hallucination across multiple benchmarks while maintaining the model's original computational efficiency.