On the Expressive Power of Contextual Relations in Transformers
arXiv cs.LG / 3/30/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that while Transformers model contextual relationships well empirically, their expressive power is not fully characterized mathematically.
- It proposes a measure-theoretic framework where texts are probability measures in a semantic embedding space and contextual relations are represented using coupling measures.
- The authors introduce the “Sinkhorn Transformer,” a transformer-like architecture designed for this coupling-measure setting.
- The main contribution is a universal approximation theorem showing that continuous coupling functions between probability measures can be uniformly approximated by a Sinkhorn Transformer with suitable parameters.
Related Articles

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer
Simon Willison's Blog
Beyond the Chatbot: Engineering Multi-Agent Ecosystems in 2026
Dev.to

I missed the "fun" part in software development
Dev.to

The Billion Dollar Tax on AI Agents
Dev.to

Hermes Agent: A Self-Improving AI Agent That Runs Anywhere
Dev.to