On the Invariants of Softmax Attention

arXiv cs.LG / 5/6/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces the “energy field” (row-centered attention logits) to study what structural invariants remain inside softmax attention across different models and inputs.
It derives mechanism-level invariants from softmax’s algebra, including a per-row zero-sum constraint, a rank bound tied to head dimension, and related spectral signatures.
It also finds model-level regularities that are not directly enforced by the attention mechanism but appear consistently across multiple autoregressive language model families tested.
The energy field’s variance is shown to be delocalized across key positions, attributed to a property of the key matrix called “key incoherence,” which in turn supports a per-head training monitor.
The results are validated across multiple context lengths and across different input texts, indicating the invariants are robust rather than instance-specific.

Abstract

Softmax attention maps every query--key interaction into a probability distribution, but the underlying structure remains largely unexplored. We define the \emph{energy field}, the row-centered attention logit, and show that it exhibits invariant properties across models, architectures, and inputs. Two classes of invariants emerge. \emph{Mechanism-level} invariants follow from the algebraic structure of softmax attention. They include a per-row zero-sum constraint, a rank bound determined by the head dimension, and spectral signatures that follow from them. \emph{Model-level} regularities are not required by the mechanism, yet hold in every autoregressive language model we test, spanning several architecture families. The energy field distributes its variance over key positions without concentrating at a few. This delocalization traces to a property of the key matrix we call \emph{key incoherence}. These invariants have practical consequences. The rank bound confines the energy field to a low-dimensional subspace. Key incoherence yields a per-head training monitor. All results are verified at multiple context lengths and input texts.

Top 10 Free AI Tools for Students in 2026: The Ultimate Study Guide

Dev.to

AI as Your Contingency Co-Pilot: Automating Wedding Day 'What-Ifs'

Dev.to

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss

MarkTechPost

When Claude Hallucinates in Court: The Latham & Watkins Incident and What It Means for Attorney Liability

MarkTechPost

Solidity LM surpasses Opus

Reddit r/LocalLLaMA

On the Invariants of Softmax Attention

Key Points

Abstract

Related Articles

Top 10 Free AI Tools for Students in 2026: The Ultimate Study Guide

AI as Your Contingency Co-Pilot: Automating Wedding Day 'What-Ifs'

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss

When Claude Hallucinates in Court: The Latham & Watkins Incident and What It Means for Attorney Liability

Solidity LM surpasses Opus

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer