Where Fake Citations Are Made: Tracing Field-Level Hallucination to Specific Neurons in LLMs
arXiv cs.AI / 4/22/2026
💬 OpinionModels & Research
Key Points
- The study analyzes hallucinated citations in 9 LLMs using 108,000 generated references and finds that author-name fields fail more often than other citation fields across models and settings.
- Citation formatting/style does not significantly change citation accuracy, while reasoning-focused distillation can reduce recall of correct citation elements.
- Field-level hallucination signals are largely non-transferable: probes trained on one citation field only transfer to others at near-chance performance.
- By applying elastic-net regularization with stability selection to neuron-level CETT values in Qwen2.5-32B-Instruct, the researchers identify a sparse set of field-specific hallucination (FH) neurons, and causal interventions confirm that boosting them increases hallucinations and suppressing them improves citation performance across fields.
- The work proposes a lightweight detection/mitigation strategy for citation hallucination based on internal neuron signals rather than external supervision.
Related Articles

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.
Dev.to

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)
Dev.to
Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF
Reddit r/LocalLLaMA

Building a Visual Infrastructure Layer: How We’re Solving the "Visual Trust Gap" for E-com
Dev.to
DeepSeek-V4 Runs on Huawei Ascend Chips at 85% Utilization — Here's What That Means for AI Infrastructure and Pricing
Dev.to