HTDC: Hesitation-Triggered Differential Calibration for Mitigating Hallucination in Large Vision-Language Models

arXiv cs.CV / 4/15/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper identifies that hallucinations in large vision-language models can stem from unstable visual grounding combined with over-reliance on language priors.
It proposes Hesitation-Triggered Differential Calibration (HTDC), a training-free decoding method that applies calibration only at layer-wise “hesitation” steps rather than at every token.
The hesitation signal is derived from fluctuations in token preference across intermediate layers, used to detect grounding instability.
When triggered, HTDC compares the standard full-branch inference against two lightweight probes (visual-nullification and semantic-nullification) to suppress hallucination-prone candidates.
Experiments on hallucination benchmarks show HTDC reduces hallucinations while preserving task accuracy and lowering computation versus per-step calibration.

Abstract

Large vision-language models (LVLMs) achieve strong multimodal performance, but still suffer from hallucinations caused by unstable visual grounding and over-reliance on language priors. Existing training-free decoding methods typically apply calibration at every decoding step, introducing unnecessary computation and potentially disrupting stable predictions. We address this problem by identifying layer-wise hesitation, a simple signal of grounding instability reflected by fluctuations in token preference across intermediate layers. Based on this observation, we propose Hesitation-Triggered Differential Calibration (HTDC), a training-free decoding framework that preserves standard full-branch inference and activates calibration only at hesitation-prone steps. When triggered, HTDC contrasts the full branch with two lightweight probes, a visual-nullification probe and a semantic-nullification probe, to suppress hallucination-prone candidates while avoiding unnecessary intervention on stable steps. Experiments on representative hallucination benchmarks show that HTDC consistently reduces hallucinations while maintaining strong task accuracy, achieving a favorable trade-off between effectiveness and computational overhead.