[R] Causal self-attention as a probabilistic model over embeddings

Reddit r/MachineLearning / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The article proposes a probabilistic interpretation of causal self-attention in which token embeddings act as latent variables.
It argues that the attention map introduces a change-of-variables term that creates a barrier/degeneracy boundary in embedding space.
Under this framework, causal attention is reinterpreted as providing a stability-margin, with “support tokens” being those closest to the degeneracy boundary.
The authors derive a MAP-style training objective that combines standard cross-entropy with a smooth log-barrier penalty to enforce the margin behavior.
Empirical results suggest improved robustness to input perturbations and more margin-concentrated embedding geometry with little loss on clean accuracy when regularization strength is modest.

We’ve been working on a probabilistic interpretation of causal self-attention where token embeddings are treated as latent variables. In that view, the attention map induces a change-of-variables term, which leads to a barrier / degeneracy boundary in embedding space.

The resulting picture is:

a stability-margin interpretation of causal attention
“support tokens,” i.e. the positions closest to the degeneracy boundary
a simple MAP-style training penalty: standard cross-entropy plus a smooth log-barrier term

Empirically, this improves robustness to input perturbations and makes the learned geometry more margin-concentrated, without much loss in clean accuracy at modest regularization strengths.

Curious whether this framing feels natural to people, or whether it reads more like a <insert-your-favorite-regularizer-here> than a genuinely probabilistic view.

submitted by /u/Old-Letterhead-1945
[link] [comments]

Interactive Web Visualization of GPT-2

Reddit r/artificial

Stop Treating AI Interview Fraud Like a Proctoring Problem

Dev.to

The 5 software development trends that actually matter in 2026 (and what they mean for your startup)

Dev.to

InVideo AI Review: Fast Finished

Dev.to

Zuckerberg Built an AI CEO. Now Someone Has to Do the Work It Delegates.

Dev.to

[R] Causal self-attention as a probabilistic model over embeddings

Key Points

Related Articles

Interactive Web Visualization of GPT-2

Stop Treating AI Interview Fraud Like a Proctoring Problem

The 5 software development trends that actually matter in 2026 (and what they mean for your startup)

InVideo AI Review: Fast Finished

Zuckerberg Built an AI CEO. Now Someone Has to Do the Work It Delegates.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer