EnsemHalDet: Robust VLM Hallucination Detection via Ensemble of Internal State Detectors

arXiv cs.CL / 4/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

EnsemHalDet is a hallucination-detection framework for vision-language models that identifies incorrect or ungrounded outputs by inspecting internal representations rather than relying only on final model responses.
The method uses an ensemble of multiple internal-state detectors, training separate detectors on diverse signals such as attention outputs and hidden states to capture a wider range of hallucination patterns.
Experiments on several VQA datasets and across multiple VLMs show EnsemHalDet achieves consistently better AUC than prior approaches and single-detector baselines.
The paper argues that ensembling heterogeneous internal signals improves the robustness and reliability of multimodal hallucination detection.

Abstract

Vision-Language Models (VLMs) excel at multimodal tasks, but they remain vulnerable to hallucinations that are factually incorrect or ungrounded in the input image. Recent work suggests that hallucination detection using internal representations is more efficient and accurate than approaches that rely solely on model outputs. However, existing internal-representation-based methods typically rely on a single representation or detector, limiting their ability to capture diverse hallucination signals. In this paper, we propose EnsemHalDet, an ensemble-based hallucination detection framework that leverages multiple internal representations of VLMs, including attention outputs and hidden states. EnsemHalDet trains independent detectors for each representation and combines them through ensemble learning. Experimental results across multiple VQA datasets and VLMs show that EnsemHalDet consistently outperforms prior methods and single-detector models in terms of AUC. These results demonstrate that ensembling diverse internal signals significantly improves robustness in multimodal hallucination detection.