On the Geometry of Positional Encodings in Transformers
arXiv cs.LG / 4/8/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that positional encodings need a principled mathematical theory rather than trial-and-error design, and develops such a framework for Transformers.
- It proves that Transformers lacking any positional signal cannot solve tasks whose outcomes depend on word order (Necessity Theorem).
- Under mild, verifiable conditions, it shows that training yields distinct vector representations for different sequence positions at every global minimizer (Positional Separation Theorem).
- It formulates an information-optimal encoding objective by constructing an embedding via classical multidimensional scaling (MDS) on Hellinger distance between positional distributions, using “stress” as a single quality metric.
- The work demonstrates that the optimal encoding has an effective rank r ≤ n−1 and can be parameter-efficiently represented, with experiments suggesting ALiBi achieves much lower stress than sinusoidal encodings and RoPE in line with a rank-1 interpretation.
Related Articles

Black Hat Asia
AI Business

The enforcement gap: why finding issues was never the problem
Dev.to

How I Built AI-Powered Auto-Redaction Into a Desktop Screenshot Tool
Dev.to

Agentic AI vs Traditional Automation: Why They Require Different Approaches in Modern Enterprises
Dev.to

Agentic AI vs Traditional Automation: Why Modern Enterprises Must Treat Them Differently
Dev.to