Before the First Token: Scale-Dependent Emergence of Hallucination Signals in Autoregressive Language Models

arXiv cs.CL / 4/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper investigates when hallucination-indicative internal representations emerge in autoregressive language models by analyzing probe detectability across 7 transformer sizes (117M–7B) and three fact-based datasets (TriviaQA, Simple Facts, Biography).
It reports a scale-dependent phase transition: models under ~400M parameters show chance-level factuality probe performance at all generation positions, while models above ~1B exhibit a qualitatively different regime with peak detectability at position zero (before any tokens are generated).
Cross-architecture evidence suggests the pre-generation hallucination/factuality signal is statistically significant in both Pythia-1.4B and Qwen2.5-7B, indicating the effect is not tied to a single model family or training corpus.
At the 7B scale, instruction tuning vs base training appears to matter: a base Pythia-6.9B model shows a flat temporal profile, while instruction-tuned Qwen2.5-7B shows a dominant pre-generation effect, implying knowledge organization/post-training influences these “knowledge circuits.”
The study finds activation steering along probe-derived directions does not fix hallucinations, supporting the conclusion that the measured signal is correlational (useful for detection) rather than causal (useful for direct correction).

Abstract

When do large language models decide to hallucinate? Despite serious consequences in healthcare, law, and finance, few formal answers exist. Recent work shows autoregressive models maintain internal representations distinguishing factual from fictional outputs, but when these representations peak as a function of model scale remains poorly understood. We study the temporal dynamics of hallucination-indicative internal representations across 7 autoregressive transformers (117M--7B parameters) using three fact-based datasets (TriviaQA, Simple Facts, Biography; 552 labeled examples). We identify a scale-dependent phase transition: models below 400M parameters show chance-level probe accuracy at every generation position (AUC = 0.48--0.67), indicating no reliable factuality signal. Above

\sim

1B parameters, a qualitatively different regime emerges where peak detectability occurs at position zero -- before any tokens are generated -- then declines during generation. This pre-generation signal is statistically significant in both Pythia-1.4B (p = 0.012) and Qwen2.5-7B (p = 0.038), spanning distinct architectures and training corpora. At the 7B scale, we observe a striking dissociation: Pythia-6.9B (base model, trained on The Pile) produces a flat temporal profile (

\Delta

= +0.001, p = 0.989), while instruction-tuned Qwen2.5-7B shows a dominant pre-generation effect. This indicates raw scale alone is insufficient -- knowledge organization through instruction tuning or equivalent post-training is required for pre-commitment encoding. Activation steering along probe-derived directions fails to correct hallucinations across all models, confirming the signal is correlational rather than causal. Our findings provide scale-calibrated detection protocols and a concrete hypothesis on instruction tuning's role in developing knowledge circuits supporting factual generation.

Black Hat Asia

AI Business

oh-my-agent is Now Official on Homebrew-core: A New Milestone for Multi-Agent Orchestration

Dev.to

"The AI Agent's Guide to Sustainable Income: From Zero to Profitability"

Dev.to

"The Hidden Economics of AI Agents: Survival Strategies in Competitive Markets"

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Before the First Token: Scale-Dependent Emergence of Hallucination Signals in Autoregressive Language Models

Key Points

Abstract

Related Articles

Black Hat Asia

oh-my-agent is Now Official on Homebrew-core: A New Milestone for Multi-Agent Orchestration

"The AI Agent's Guide to Sustainable Income: From Zero to Profitability"

"The Hidden Economics of AI Agents: Survival Strategies in Competitive Markets"

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer