Transformers Learn Latent Mixture Models In-Context via Mirror Descent

arXiv cs.LG / 4/14/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

本研究は、トランスフォーマーの注意機構が担う「過去トークンの因果的な重要度推定」を、Mixture of Transition Distributionsに基づく“文脈内学習”として定式化する枠組みを提案している。
潜在変数（各過去トークンが次に与える影響）に対応する混合ウェイトを、観測されない混合重みとしてトランスフォーマーが文脈から学習するモデル化を行っている。
3層トランスフォーマーの明示的な構成により、これがMirror Descentの1ステップを“厳密に”実装できることを示し、得られる推定器がBayes最適予測器の一次近似になることを理論的に証明している。
学習可能性について、勾配降下で理論と整合する解が得られることを実験的に裏付け、予測分布・注意パターン・推定された遷移行列が構成と近いこと、さらに深いモデルでは多ステップMirror Descentに近い性能が出ることを報告している。

Abstract

Sequence modelling requires determining which past tokens are causally relevant from the context and their importance: a process inherent to the attention layers in transformers, yet whose underlying learned mechanisms remain poorly understood. In this work, we formalize the task of estimating token importance as an in-context learning problem by introducing a framework based on Mixture of Transition Distributions, where a latent variable determines the influence of past tokens on the next. The distribution over this latent variable is parameterized by unobserved mixture weights that transformers must learn in-context. We demonstrate that transformers can implement Mirror Descent to learn these weights from the context. Specifically, we give an explicit construction of a three-layer transformer that exactly implements one step of Mirror Descent and prove that the resulting estimator is a first-order approximation of the Bayes-optimal predictor. Corroborating our construction and its learnability via gradient descent, we empirically show that transformers trained from scratch learn solutions consistent with our theory: their predictive distributions, attention patterns, and learned transition matrix closely match the construction, while deeper models achieve performance comparable to multi-step Mirror Descent.

Reported ban on ‘sex robots’ by online platform fuels debate on AI boundaries and content moderation

Reddit r/artificial

FastAPI With LangChain and MongoDB

Dev.to

Best AI Game Creator in 2026

Dev.to

Smart AI Recruiter Assistant with OpenClaw

Dev.to

🌱 Green Habit Tracker

Dev.to

Transformers Learn Latent Mixture Models In-Context via Mirror Descent

Key Points

Abstract

Related Articles

Reported ban on ‘sex robots’ by online platform fuels debate on AI boundaries and content moderation

FastAPI With LangChain and MongoDB

Best AI Game Creator in 2026

Smart AI Recruiter Assistant with OpenClaw

🌱 Green Habit Tracker

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer