RAGognizer: Hallucination-Aware Fine-Tuning via Detection Head Integration

arXiv cs.CL / 4/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes RAGognizer and RAGognizer fine-tuning to make hallucination detection part of training for Retrieval-Augmented Generation (RAG) systems rather than a post-hoc check.
It introduces a new dataset of naturally occurring closed-domain hallucinations with token-level annotations, enabling supervised hallucination-aware learning.
The method integrates a lightweight detection head into an LLM so the model can jointly optimize language modeling and hallucination detection.
By improving separability of internal representations tied to hallucinations, the approach both boosts token-level hallucination detection performance and reduces hallucination rates during generation.
Experiments on multiple benchmarks report state-of-the-art token-level hallucination detection and substantial hallucination reduction without hurting language quality or relevance.

Abstract

Retrieval-Augmented Generation (RAG) is widely used to augment the input to Large Language Models (LLMs) with external information, such as recent or domain-specific knowledge. Nonetheless, current models still produce closed-domain hallucinations and generate content that is unsupported by the retrieved context. Current detection approaches typically treat hallucination as a post-hoc problem, relying on black-box consistency checks or probes over frozen internal representations. In this work, we demonstrate that hallucination detection based on internal state representation can also serve as a direct training signal. We introduce RAGognize, a dataset of naturally occurring closed-domain hallucinations with token-level annotations, and RAGognizer, a hallucination-aware fine-tuning approach that integrates a lightweight detection head into an LLM, allowing for the joint optimization of language modeling and hallucination detection. This joint objective forces the model to improve the separability of its internal states regarding hallucinations while simultaneously learning to generate well-formed and meaningful responses. Across multiple benchmarks, RAGognizer achieves state-of-the-art token-level hallucination detection while substantially reducing hallucination rates during generation, without degrading language quality or relevance.

Awesome Open-Weight Models: The Practitioner's Guide to Open-Source LLMs (2026 Edition) [P]

Reddit r/MachineLearning

The Mythos vs GPT-5.4-Cyber debate is missing the benchmark

Dev.to

Beyond the Crop: Automating "Ghost Mannequin" Effects with Depth-Aware Inpainting

Dev.to

The $20/month AI subscription is gaslighting developers in emerging markets

Dev.to

A Claude Code hook that warns you before calling a low-trust MCP server

Dev.to

RAGognizer: Hallucination-Aware Fine-Tuning via Detection Head Integration

Key Points

Abstract

Related Articles

Awesome Open-Weight Models: The Practitioner's Guide to Open-Source LLMs (2026 Edition) [P]

The Mythos vs GPT-5.4-Cyber debate is missing the benchmark

Beyond the Crop: Automating "Ghost Mannequin" Effects with Depth-Aware Inpainting

The $20/month AI subscription is gaslighting developers in emerging markets

A Claude Code hook that warns you before calling a low-trust MCP server

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer