Do All Vision Transformers Need Registers? A Cross-Architectural Reassessment
arXiv cs.LG / 3/30/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses training artifacts in Vision Transformer (ViT) attention maps and how these artifacts affect interpretability.
- It reproduces prior work that proposes adding empty “register” tokens to store global information beyond the [CLS] token, showing that the approach can improve attention-map clarity.
- The authors reassess generalizability across several vision transformer families (including DINO, DINOv2, OpenCLIP, and DeiT3) and find that some earlier claims are not universal.
- They investigate how model size changes the findings, extending the discussion to smaller models.
- The work also resolves terminology inconsistencies from the original paper and explains how those differences can mislead cross-model comparisons.
Related Articles

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer
Simon Willison's Blog
Beyond the Chatbot: Engineering Multi-Agent Ecosystems in 2026
Dev.to

I missed the "fun" part in software development
Dev.to

The Billion Dollar Tax on AI Agents
Dev.to

Hermes Agent: A Self-Improving AI Agent That Runs Anywhere
Dev.to