A Mechanistic Account of Attention Sinks in GPT-2: One Circuit, Broader Implications for Mitigation

arXiv cs.LG / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper analyzes the well-known “attention sink” behavior in GPT-2-style transformers, where the model disproportionately attends to the first token position.
Using structural analysis and causal interventions, the authors identify a specific contributing interaction between a learned query bias, the first-layer MLP’s processing of absolute positional encodings, and structure in the key projection.
The results are validated across diverse input types, including natural language, mathematical expressions, and code, suggesting the phenomenon is robust.
Importantly, the authors show that each identified component can be removed while the attention sink still persists, implying that attention sinks can be produced by different circuits in different model architectures.
The study provides guidance for designing and evaluating mitigation strategies, while motivating further research into the underlying reasons attention sinks emerge.

Abstract

Transformers commonly exhibit an attention sink: disproportionately high attention to the first position. We study this behavior in GPT-2-style models with learned query biases and absolute positional embeddings. Combining structural analysis with causal interventions, validated across natural-language, mathematical, and code inputs, we find that the sink arises from the interaction among (i) a learned query bias, (ii) the first-layer MLP transformation of the positional encoding, and (iii) structure in the key projection. Crucially, each component we identify is individually dispensable: architectures omitting each of them robustly exhibit sinks. This indicates that attention sinks may arise through distinct circuits across architectures. These findings inform mitigation of sinks, and motivate broader investigation into why sinks emerge.