Stream separation improves Bregman conditioning in transformers

arXiv cs.LG / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that linear steering approaches for transformer representations often assume Euclidean geometry, but softmax actually induces a curved Bregman geometry with a Hessian-based metric tensor, and ignoring it can cause probability mass leakage to unintended tokens.
  • It extends the analysis beyond the output layer by measuring the Hessian (metric conditioning) at intermediate layers using a controlled 2x2 experiment with stream separation and per-layer supervision via vocabulary decoding loss.
  • In standard single-stream transformers, the Hessian metric at intermediate layers is highly degenerate, while stream separation improves conditioning by up to 22× in effective rank (with matched model size and vocabulary).
  • The study finds that per-layer supervision further helps, though less than stream separation, and that the cosine similarity between “primal” and “dual” concept directions predicts steering effectiveness across downstream tasks with a threshold around 0.3.
  • The results are framed as implications for the reliability of linear safety interventions, since such methods depend on the geometry being well-conditioned at the layer where steering is applied.

Abstract

Linear methods for steering transformer representations, including probing, activation engineering, and concept erasure, implicitly assume the geometry of representation space is Euclidean. Park et al. [Park et al., 2026] showed that softmax induces a curved Bregman geometry whose metric tensor is the Hessian of the log-normalizer, H({\lambda}) = Cov[{\gamma} | {\lambda}]. Ignoring this curvature causes Euclidean steering to leak probability mass to unintended tokens. Their analysis applies at the output layer. We measure this Hessian at intermediate layers in a controlled 2x2 design crossing stream separation with per-layer supervision (vocabulary decoding loss at each layer), all at matched vocabulary and parameter count. In standard single-stream transformers, H is severely degenerate at intermediate layers (effective rank 8 in 516 dimensions). Stream separation improves conditioning by up to 22 in effective rank, even without auxiliary supervision. Per-layer supervision helps, but less. The cosine similarity between primal and dual concept directions predicts per-layer steering effectiveness on downstream tasks, with a threshold near 0.3. These results bear on the reliability of linear safety interventions, which depend on the geometry being well-conditioned at the layer where they are applied.