AI Navigate

The Phenomenology of Hallucinations

arXiv cs.AI / 3/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that language model hallucinations stem from a failure to integrate uncertainty into output generation, not from an inability to detect uncertainty.
  • Uncertain inputs occupy high-dimensional regions, and although uncertainty is detectable, its signal weakly couples to the output layer, causing it to be geometrically amplified yet functionally silent in low-sensitivity subspaces.
  • Topological analysis shows uncertainty representations fragment rather than converge to a unified abstention state, with gradient and Fisher probes revealing collapsing sensitivity along the uncertainty direction.
  • Cross-entropy training fails to incentivize abstention, rewarding confident predictions, so associative mechanisms amplify fractured activations and push the model to commit outputs despite internal uncertainty; causal interventions can restore refusal when uncertainty is connected to logits.

Abstract

We show that language models hallucinate not because they fail to detect uncertainty, but because of a failure to integrate it into output generation. Across architectures, uncertain inputs are reliably identified, occupying high-dimensional regions with 2-3\times the intrinsic dimensionality of factual inputs. However, this internal signal is weakly coupled to the output layer: uncertainty migrates into low-sensitivity subspaces, becoming geometrically amplified yet functionally silent. Topological analysis shows that uncertainty representations fragment rather than converging to a unified abstention state, while gradient and Fisher probes reveal collapsing sensitivity along the uncertainty direction. Because cross-entropy training provides no attractor for abstention and uniformly rewards confident prediction, associative mechanisms amplify these fractured activations until residual coupling forces a committed output despite internal detection. Causal interventions confirm this account by restoring refusal when uncertainty is directly connected to logits.