Unmasking Hallucinations: A Causal Graph-Attention Perspective on Factual Reliability in Large Language Models

arXiv cs.CL / 4/7/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper tackles factual hallucinations in large language models by analyzing how internal transformer attention contributes to unsupported outputs.
  • It proposes a causal graph attention network (GCAN) that builds token-level graphs using self-attention weights and gradient-based influence scores to measure factual dependencies.
  • The method introduces a Causal Contribution Score (CCS) to quantify how much each token causally contributes to the model’s factual reliability.
  • A fact-anchored graph reweighting layer is used during generation to dynamically downweight hallucination-prone nodes.
  • Experiments on TruthfulQA and HotpotQA report a 27.8% reduction in hallucination rate and a 16.4% improvement in factual accuracy versus baseline retrieval-augmented generation (RAG) models.

Abstract

This paper primarily focuses on the hallucinations caused due to AI language models(LLMs).LLMs have shown extraordinary Language understanding and generation capabilities .Still it has major a disadvantage hallucinations which give outputs which are factually incorrect ,misleading or unsupported by input data . These hallucinations cause serious problems in scenarios like medical diagnosis or legal reasoning.Through this work,we propose causal graph attention network (GCAN) framework that reduces hallucinations through interpretation of internal attention flow within a transformer architecture with the help of constructing token level graphs that combine self attention weights and gradient based influence scores.our method quantifies each tokens factual dependency using a new metric called the Causal Contribution Score (CCS). We further introduce a fact-anchored graph reweighting layer that dynamically reduces the influence of hallucination prone nodes during generation. Experiments on standard benchmarks such as TruthfulQA and HotpotQA show a 27.8 percent reduction in hallucination rate and 16.4 percent improvement in factual accuracy over baseline retrieval-augmented generation (RAG) models. This work contributes to the interpretability,robustness, and factual reliability of future LLM architectures.