[R] Causal self-attention as a probabilistic model over embeddings

Reddit r/MachineLearning / 2026/3/24

💬 オピニオンIdeas & Deep AnalysisModels & Research

要点

  • The article proposes a probabilistic interpretation of causal self-attention in which token embeddings act as latent variables.
  • It argues that the attention map introduces a change-of-variables term that creates a barrier/degeneracy boundary in embedding space.
  • Under this framework, causal attention is reinterpreted as providing a stability-margin, with “support tokens” being those closest to the degeneracy boundary.
  • The authors derive a MAP-style training objective that combines standard cross-entropy with a smooth log-barrier penalty to enforce the margin behavior.
  • Empirical results suggest improved robustness to input perturbations and more margin-concentrated embedding geometry with little loss on clean accuracy when regularization strength is modest.

We’ve been working on a probabilistic interpretation of causal self-attention where token embeddings are treated as latent variables. In that view, the attention map induces a change-of-variables term, which leads to a barrier / degeneracy boundary in embedding space.

The resulting picture is:

  • a stability-margin interpretation of causal attention
  • “support tokens,” i.e. the positions closest to the degeneracy boundary
  • a simple MAP-style training penalty: standard cross-entropy plus a smooth log-barrier term

Empirically, this improves robustness to input perturbations and makes the learned geometry more margin-concentrated, without much loss in clean accuracy at modest regularization strengths.

Curious whether this framing feels natural to people, or whether it reads more like a <insert-your-favorite-regularizer-here> than a genuinely probabilistic view.

submitted by /u/Old-Letterhead-1945
[link] [comments]