Facet-Level Tracing of Evidence Uncertainty and Hallucination in RAG
arXiv cs.CL / 4/13/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that hallucinations in RAG persist even with relevant documents retrieved, and existing evaluations (answer- or passage-level) miss how evidence is actually used during generation.
- It introduces a facet-level diagnostics framework that breaks each QA question into atomic “reasoning facets” and measures evidence sufficiency/grounding via a Facet × Chunk matrix combining retrieval relevance with NLI-based faithfulness.
- The method compares three inference modes—Strict RAG (retrieval-only), Soft RAG (retrieval plus parametric knowledge), and LLM-only (no retrieval)—to quantify retrieval-generation misalignment where relevant evidence is retrieved but not properly integrated.
- Experiments on medical QA and HotpotQA using multiple LLMs (GPT, Gemini, LLaMA) show recurring failure modes such as evidence absence, evidence misalignment, and prior-driven overrides that are largely invisible under standard answer-level metrics.
- The results suggest hallucination drivers in RAG are less about retrieval accuracy and more about the integration strategy between retrieved evidence and model prior knowledge, with the proposed diagnostics enabling interpretable diagnosis of those integration failures.
Related Articles

Why Fashion Trend Prediction Isn’t Enough Without Generative AI
Dev.to
Chatbot vs Voicebot: The Real Business Decision Nobody Talks About
Dev.to
วิธีใช้ AI ทำ SEO ให้เว็บติดอันดับ Google (2026)
Dev.to

Free AI Tools With No Message Limits — The Definitive List (2026)
Dev.to
Why Domain Knowledge Is Critical in Healthcare Machine Learning
Dev.to