Concepts Whisper While Syntax Shouts: Spectral Anti-Concentration and the Dual Geometry of Transformer Representations

arXiv cs.LG / 5/5/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper tests whether a causal inner product (based on unembedding covariance) supports cross-lingual concept transport across 17 transformer models and 4 language pairs, finding the method’s effect indistinguishable from spectral regularization alone (p = 0.95).
Instead of confirming the specific causal-transport mechanism, the authors find strong evidence of anti-concentration in residual-stream “difference-of-means” vectors across five architecture families (p < 10^-33), corroborated by SAE features and linear probes on Gemma and Llama.
The study uncovers a “dual geometry” in transformer representations: concept-like directions anti-concentrate in the spectral tail of activation space, while static unembedding-row contrasts concentrate in high-variance directions.
Using split-injection causal interventions and POS-tag probing across 8 models, the authors show syntax tends to be preferentially encoded in the high-variance subspace in 6/8 architectures, with Qwen 2.5 exhibiting a reversal consistent with architecture-specific spectral structure.
Overall, the results suggest transformers may shift semantic content into spectrally quieter regions during contextualized processing, enabling concept manipulation with less grammatical disruption.

Abstract

We test whether the causal inner product of \citet{park2024linear} -- defined by the unembedding covariance

\Sigma

-- enables cross-lingual concept transport. Across 17 models and 4 language pairs, a matched-spectrum randomization test finds that Whitened Causal Alignment is indistinguishable from spectral regularization alone (

p = 0.95

). However, this failure reveals a broader phenomenon: anti-concentration is observed in residual-stream difference-of-means vectors across five architecture families (

p < 10^{-33}

) and supported by SAE features (e.g.,

p = 4.5 \times 10^{-19}

) and linear probes on Gemma and Llama. We discover a \emph{dual geometry}: activation-space concept directions anti-concentrate in the spectral tail, while static unembedding-row contrasts \emph{concentrate} in high-variance directions (

p < 10^{-4}

). Split-injection causal interventions support the functional basis on Gemma and Llama (Cohen's

d

up to

1.80

), and POS-tag probing across 8 models shows syntax preferentially encodes in the high-variance subspace in 6 of 8 architectures (

p < 0.013

), with the Qwen~2.5 family showing a significant reversal consistent with architecture-specific spectral structure. These results suggest transformers may rotate semantic content into spectrally quiet regions during contextualized processing, encoding concepts where they can be manipulated with reduced grammatical disruption.