Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models

arXiv stat.ML / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper shows that, under stochastic scaling, the token dynamics across layers in a finite transformer with MLP blocks converge (pathwise) to a continuous-time stochastic interacting particle system.
It derives the specific stochastic partial differential equation (SPDE) that governs how the token distribution evolves in the limiting model.
The authors prove propagation of chaos, establishing that as the number of tokens grows large, tokens behave increasingly independently while still following the same limiting law.
The study demonstrates “synchronization by noise,” meaning the limiting stochastic system exhibits exponential decay of interaction energy on average when common noise is strong enough relative to the deterministic self-attention drift.
It also characterizes which activation functions satisfy the coercivity condition required for the noise-driven synchronization results.

Abstract

We prove pathwise convergence of the layerwise evolution of tokens in a finite-depth, finite-width transformer model with MultiLayer Perceptron (MLP) blocks to a continuous-time stochastic interacting particle system. We also identify the stochastic partial differential equation describing the evolution of the tokens' distribution in this limit and prove propagation of chaos when the number of such tokens is large. The bounds we establish are quantitative and the limits we consider commute. We further prove that the limiting stochastic model displays synchronization by noise and establish exponential dissipation of the interaction energy on average, provided that the common noise is sufficiently coercive relative to the deterministic self-attention drift. We finally characterize the activation functions satisfying the former condition.

Building a Local AI Agent (Part 2): Six UX and UI Design Challenges

Dev.to

We Built a DNS-Based Discovery Protocol for AI Agents — Here's How It Works

Dev.to

Your first business opportunity in 3 commands: /register_directory in @biznode_bot, wait for matches, then /my_pulse to view...

Dev.to

Building AI Evaluation Pipelines: Automating LLM Testing from Dataset to CI/CD

Dev.to

Function Calling Harness 2: CoT Compliance from 9.91% to 100%

Dev.to

Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models

Key Points

Abstract

Related Articles

Building a Local AI Agent (Part 2): Six UX and UI Design Challenges

We Built a DNS-Based Discovery Protocol for AI Agents — Here's How It Works

Your first business opportunity in 3 commands: /register_directory in @biznode_bot, wait for matches, then /my_pulse to view...

Building AI Evaluation Pipelines: Automating LLM Testing from Dataset to CI/CD

Function Calling Harness 2: CoT Compliance from 9.91% to 100%

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer