Hallucination as Trajectory Commitment: Causal Evidence for Asymmetric Attractor Dynamics in Transformer Generation
arXiv cs.AI / 4/20/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper provides causal evidence that hallucinations in autoregressive language models behave like early trajectory commitments driven by asymmetric attractor dynamics.
- Using a same-prompt bifurcation method, the authors show spontaneous divergence between factual and hallucinated generations can occur immediately at the first generated token, quantified via large KL differences from step 0 to step 1.
- Activation patching across 28 layers reveals strong causal asymmetry: perturbing a correct trajectory with a hallucinated activation disrupts outputs far more often than perturbing a hallucinated trajectory with a correct activation.
- Windowed (multi-step) patching indicates that correcting hallucinations requires sustained intervention across multiple generation steps, while corruption can be triggered by a single perturbation.
- Prompt-encoding residual states at step 0 predict each prompt’s hallucination rate, and clustering suggests distinct regime-like groups whose structure concentrates the prompts that bifurcate into false premises.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)
Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI
Dev.to

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else
Dev.to
Local LLM Beginner’s Guide (Mac - Apple Silicon)
Reddit r/artificial

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals
Dev.to