Characterizing the Expressivity of Local Attention in Transformers
arXiv cs.CL / 5/4/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies why adding local attention to transformers can improve quality despite being introduced mainly for efficiency reasons.
- It provides a formal explanation using “recognizer expressivity,” linking fixed-precision global-attention transformers to a fragment of linear temporal logic with a single past operator.
- The authors prove that introducing local attention adds a second temporal operator, which strictly expands the class of regular languages the model can recognize.
- They show global and local attention are expressively complementary—neither alone subsumes the other—and that combining both gives the richest expressivity.
- Experiments in formal language recognition and natural language modeling confirm that hybrid global-local transformers outperform global-only models.
Related Articles
AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs
Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI
The Verge

CLMA Frame Test
Dev.to

You Are Right — You Don't Need CLAUDE.md
Dev.to

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions
Dev.to