Do Hallucination Neurons Generalize? Evidence from Cross-Domain Transfer in LLMs

arXiv cs.CL / 4/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Researchers studied whether so-called “hallucination neurons” (H-neurons) that predict LLM hallucinations on general QA also transfer across different knowledge domains.
They evaluated cross-domain transfer across six domains (general QA, legal, financial, science, moral reasoning, and code vulnerability) using five open-weight LLMs (3B–8B parameters).
The H-neuron-based classifiers showed strong in-domain performance (AUROC 0.783) but substantially weaker out-of-domain transfer (AUROC 0.563), indicating a consistent degradation across models.
The findings suggest hallucination is not governed by a single universal neural signature; instead, it appears to involve domain-specific neuron populations.
As a practical implication, neuron-level hallucination detectors would need domain-specific calibration rather than one-size-fits-all training.

Abstract

Recent work identifies a sparse set of "hallucination neurons" (H-neurons), less than 0.1% of feed-forward network neurons, that reliably predict when large language models will hallucinate. These neurons are identified on general-knowledge question answering and shown to generalize to new evaluation instances. We ask a natural follow-up question: do H-neurons generalize across knowledge domains? Using a systematic cross-domain transfer protocol across 6 domains (general QA, legal, financial, science, moral reasoning, and code vulnerability) and 5 open-weight models (3B to 8B parameters), we find they do not. Classifiers trained on one domain's H-neurons achieve AUROC 0.783 within-domain but only 0.563 when transferred to a different domain (delta = 0.220, p < 0.001), a degradation consistent across all models tested. Our results suggest that hallucination is not a single mechanism with a universal neural signature, but rather involves domain-specific neuron populations that differ depending on the knowledge type being queried. This finding has direct implications for the deployment of neuron-level hallucination detectors, which must be calibrated per domain rather than trained once and applied universally.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/23DailyView insight →

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans

Dev.to

Why use an AI gateway at all?

Dev.to

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago

Dev.to

GPT Image 2 Subject-Lock Editing: A Practical Guide to input_fidelity

Dev.to

Do Hallucination Neurons Generalize? Evidence from Cross-Domain Transfer in LLMs

Key Points

Abstract

💡 Insights using this article

Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans

Why use an AI gateway at all?

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago

GPT Image 2 Subject-Lock Editing: A Practical Guide to input_fidelity

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer