On the Invariants of Softmax Attention
arXiv cs.LG / 5/6/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces the “energy field” (row-centered attention logits) to study what structural invariants remain inside softmax attention across different models and inputs.
- It derives mechanism-level invariants from softmax’s algebra, including a per-row zero-sum constraint, a rank bound tied to head dimension, and related spectral signatures.
- It also finds model-level regularities that are not directly enforced by the attention mechanism but appear consistently across multiple autoregressive language model families tested.
- The energy field’s variance is shown to be delocalized across key positions, attributed to a property of the key matrix called “key incoherence,” which in turn supports a per-head training monitor.
- The results are validated across multiple context lengths and across different input texts, indicating the invariants are robust rather than instance-specific.
Related Articles

Top 10 Free AI Tools for Students in 2026: The Ultimate Study Guide
Dev.to

AI as Your Contingency Co-Pilot: Automating Wedding Day 'What-Ifs'
Dev.to

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss
MarkTechPost
When Claude Hallucinates in Court: The Latham & Watkins Incident and What It Means for Attorney Liability
MarkTechPost
Solidity LM surpasses Opus
Reddit r/LocalLLaMA