Revisiting Anisotropy in Language Transformers: The Geometry of Learning Dynamics

arXiv cs.CL / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper revisits anisotropy in Transformer-based language models, arguing that it complicates geometric interpretations of learning dynamics.
It provides geometric explanations for how frequency-biased sampling reduces “curvature visibility” and why training tends to amplify tangent directions.
The authors introduce an empirical method that uses concept-based mechanistic interpretability during training to fit low-rank tangent proxies derived from activations.
These activation-derived tangent directions are evaluated against true gradients from standard backpropagation, showing they capture disproportionately large gradient energy and a larger portion of gradient anisotropy than matched controls.
Results are reported across both encoder-style and decoder-style language models, supporting a tangent-aligned account of anisotropy.

Abstract

Since their introduction, Transformer architectures have dominated Natural Language Processing (NLP). However, recent research has highlighted an inherent anisotropy phenomenon in these models, presenting a significant challenge to their geometric interpretation. Previous theoretical studies on this phenomenon are rarely grounded in the underlying representation geometry. In this paper, we extend them by deriving geometric arguments for how frequency-biased sampling attenuates curvature visibility and why training preferentially amplify tangent directions. Empirically, we then use concept-based mechanistic interpretability during training, rather than only post hoc, to fit activation-derived low-rank tangent proxies and test them against ordinary backpropagated true gradients. Across encoder-style and decoder-style language models, we find that these activation-derived directions capture both unusually large gradient energy and a substantially larger share of gradient anisotropy than matched-rank normal controls, providing strong empirical support for a tangent-aligned account of anisotropy.