Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps

arXiv cs.CL / 4/22/2026

📰 NewsModels & Research

Key Points

  • The paper proposes inference-time hallucination detection for SpeechLLMs without needing costly gold-standard outputs by using attention-derived metrics tailored to audio inputs.
  • It introduces four attention-based features—AUDIORATIO, AUDIOCONSISTENCY, AUDIOENTROPY, and TEXTENTROPY—and trains lightweight logistic regression classifiers to flag likely hallucinations efficiently.
  • Experiments on ASR and speech-to-text translation using Qwen-2-Audio and Voxtral-3B show the method outperforms uncertainty- and prior-attention-based baselines on in-domain data, with up to +0.23 PR-AUC improvements.
  • The approach also generalizes to out-of-domain ASR, and strong results can be achieved with about 100 attention heads rather than using all heads, improving generalization in some settings.
  • Effectiveness depends on the specific model and task, and the classifier requires task-specific training, but the study demonstrates attention patterns as a practical signal for SpeechLLM hallucination detection.

Abstract

Hallucinations in Speech Large Language Models (SpeechLLMs) pose significant risks, yet existing detection methods typically rely on gold-standard outputs that are costly or impractical to obtain. Moreover, hallucination detection methods developed for text-based LLMs do not directly capture audio-specific signals. We investigate four attention-derived metrics: AUDIORATIO, AUDIOCONSISTENCY, AUDIOENTROPY, and TEXTENTROPY, designed to capture pathological attention patterns associated with hallucination, and train lightweight logistic regression classifiers on these features for efficient inference-time detection. Across automatic speech recognition and speech-to-text translation tasks, evaluations on Qwen-2-Audio and Voxtral-3B show that our approach outperforms uncertainty-based and prior attention-based baselines on in-domain data, achieving improvements of up to +0.23 PR-AUC, and generalises to out-of-domain ASR settings. We further find that strong performance can be achieved with approximately 100 attention heads, improving out-of-domain generalisation compared to using all heads. While effectiveness is model-dependent and task-specific training is required, our results demonstrate that attention patterns provide a valuable tool for hallucination detection in SpeechLLMs.