Homogenized Transformers

arXiv stat.ML / 4/3/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

論文は、層とヘッドごとに重みが独立に再サンプルされる「初期化に近い」ランダムな深いマルチヘッド自己注意モデルを定式化し、深さを時間変数として残差ストリームの力学を粒子系として解釈します。
深さ・残差ステップサイズ・ヘッド数に対する適切な同時スケーリングのもとで、残差ストリームのダイナミクスが非自明な同質化（homogenized）極限を持つことを証明し、その極限がスケーリングにより決定的または共通ノイズを伴う確率的になると示します。
平均場レジームでは、共通ノイズがトークンの条件付き分布に対して確率的な非線形Fokker–Planck方程式を導くことを示し、Gaussian設定ではドリフトが消えるため同質化ダイナミクスを明示的に扱えると述べています。
これにより表現崩壊（representation collapse）を解析し、次元・文脈長・温度の間の定量的なトレードオフや、クラスタリング（特定の表現への過度な集約）を緩和できるレジームを特定します。

Abstract

We study a random model of deep multi-head self-attention in which the weights are resampled independently across layers and heads, as at initialization of training. Viewing depth as a time variable, the residual stream defines a discrete-time interacting particle system on the unit sphere. We prove that, under suitable joint scalings of the depth, the residual step size, and the number of heads, this dynamics admits a nontrivial homogenized limit. Depending on the scaling, the limit is either deterministic or stochastic with common noise; in the mean-field regime, the latter leads to a stochastic nonlinear Fokker--Planck equation for the conditional law of a representative token. In the Gaussian setting, the limiting drift vanishes, making the homogenized dynamics explicit enough to study representation collapse. This yields quantitative trade-offs between dimension, context length, and temperature, and identifies regimes in which clustering can be mitigated.

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Dev.to

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

Dev.to

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

MarkTechPost

The house asked me a question

Dev.to

Precision Clip Selection: How AI Suggests Your In and Out Points

Dev.to

Homogenized Transformers

Key Points

Abstract

Related Articles

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

The house asked me a question

Precision Clip Selection: How AI Suggests Your In and Out Points

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer