Where Fake Citations Are Made: Tracing Field-Level Hallucination to Specific Neurons in LLMs

arXiv cs.AI / 4/22/2026

💬 OpinionModels & Research

共有:

Key Points

The study analyzes hallucinated citations in 9 LLMs using 108,000 generated references and finds that author-name fields fail more often than other citation fields across models and settings.
Citation formatting/style does not significantly change citation accuracy, while reasoning-focused distillation can reduce recall of correct citation elements.
Field-level hallucination signals are largely non-transferable: probes trained on one citation field only transfer to others at near-chance performance.
By applying elastic-net regularization with stability selection to neuron-level CETT values in Qwen2.5-32B-Instruct, the researchers identify a sparse set of field-specific hallucination (FH) neurons, and causal interventions confirm that boosting them increases hallucinations and suppressing them improves citation performance across fields.
The work proposes a lightweight detection/mitigation strategy for citation hallucination based on internal neuron signals rather than external supervision.

Abstract

LLMs frequently generate fictitious yet convincing citations, often expressing high confidence even when the underlying reference is wrong. We study this failure across 9 models and 108{,}000 generated references, and find that author names fail far more often than other fields across all models and settings. Citation style has no measurable effect, while reasoning-oriented distillation degrades recall. Probes trained on one field transfer at near-chance levels to the others, suggesting that hallucination signals do not generalize across fields. Building on this finding, we apply elastic-net regularization with stability selection to neuron-level CETT values of Qwen2.5-32B-Instruct and identify a sparse set of field-specific hallucination neurons (FH-neurons). Causal intervention further confirms their role: amplifying these neurons increases hallucination, while suppressing them improves performance across fields, with larger gains in some fields. These results suggest a lightweight approach to detecting and mitigating citation hallucination using internal model signals alone.

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

Dev.to

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)

Dev.to

Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF

Reddit r/LocalLLaMA

Building a Visual Infrastructure Layer: How We’re Solving the "Visual Trust Gap" for E-com

Dev.to

DeepSeek-V4 Runs on Huawei Ascend Chips at 85% Utilization — Here's What That Means for AI Infrastructure and Pricing

Dev.to

Where Fake Citations Are Made: Tracing Field-Level Hallucination to Specific Neurons in LLMs

Key Points

Abstract

Related Articles

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)

Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF

Building a Visual Infrastructure Layer: How We’re Solving the "Visual Trust Gap" for E-com

DeepSeek-V4 Runs on Huawei Ascend Chips at 85% Utilization — Here's What That Means for AI Infrastructure and Pricing

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer