Hallucination-aware intermediate representation edit in large vision-language models

arXiv cs.CV / 4/1/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses hallucinations in large vision-language models, focusing on cases where model outputs contradict visual facts.
It proposes a hallucination-aware intermediate representation edit framework that dynamically detects hallucination representations and then applies hallucination-eliminating edits.
Compared with retraining-based mitigation, the method aims to avoid heavy training costs, and compared with contrastive decoding it seeks to avoid dual-inference overhead.
Experiments report state-of-the-art results on existing benchmarks with minimal extra compute, and show robustness and strong controllability over hallucinations.
The authors provide implementation code via the linked GitHub repository to support reproducibility and practical adoption.

Abstract

Large Vision-Language Models have demonstrated exceptional performance in multimodal reasoning and complex scene understanding. However, these models still face significant hallucination issues, where outputs contradict visual facts. Recent research on hallucination mitigation has focused on retraining methods and Contrastive Decoding (CD) methods. While both methods perform well, retraining methods require substantial training resources, and CD methods introduce dual inference overhead. These factors hinder their practical applicability. To address the above issue, we propose a framework for dynamically detecting hallucination representations and performing hallucination-eliminating edits on these representations. With minimal additional computational cost, we achieve state-of-the-art performance on existing benchmarks. Extensive experiments demonstrate the effectiveness of our approach, highlighting its efficient and robust hallucination elimination capability and its powerful controllability over hallucinations. Code is available at https://github.com/ASGO-MM/HIRE

Black Hat Asia

AI Business

Knowledge Governance For The Agentic Economy.

Dev.to

AI server farms heat up the neighborhood for miles around, paper finds

The Register

Paperclip: Công Cụ Miễn Phí Biến AI Thành Đội Phát Triển Phần Mềm

Dev.to

Does the Claude “leak” actually change anything in practice?

Reddit r/LocalLLaMA

Hallucination-aware intermediate representation edit in large vision-language models

Key Points

Abstract

Related Articles

Black Hat Asia

Knowledge Governance For The Agentic Economy.

AI server farms heat up the neighborhood for miles around, paper finds

Paperclip: Công Cụ Miễn Phí Biến AI Thành Đội Phát Triển Phần Mềm

Does the Claude “leak” actually change anything in practice?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer