An Analytical Theory of Spectral Bias in the Learning Dynamics of Diffusion Models

arXiv stat.ML / 4/7/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes an analytical framework describing how diffusion models’ generated distributions evolve during training, including closed-form results for the learning dynamics of linear and convolutional denoisers.
It derives a universal “inverse-variance spectral law” stating that a mode’s time to reach its target variance scales as \(\tau\propto\lambda^{-1}\), meaning coarse/high-variance structure is learned much faster than fine/low-variance detail.
The analysis shows that weight sharing (e.g., via convolution with global structure assumptions) primarily accelerates learning by effectively scaling learning rates, but does not remove the underlying spectral bias.
It finds that local convolution changes the learning dynamics qualitatively, with convolutional U-Nets exhibiting near-simultaneous emergence of many modes, unlike deep MLP-based UNets.
Experiments on both synthetic (Gaussian) and natural-image datasets support the persistence of the spectral law while highlighting architecture-dependent deviations driven by local convolution’s inductive bias.

Abstract

We develop an analytical framework for understanding how the generated distribution evolves during diffusion model training. Leveraging a Gaussian-equivalence principle, we solve the full-batch gradient-flow dynamics of linear and convolutional denoisers and integrate the resulting probability-flow ODE, yielding analytic expressions for the generated distribution. The theory exposes a universal inverse-variance spectral law: the time for an eigen- or Fourier mode to match its target variance scales as

\tau\propto\lambda^{-1}

, so high-variance (coarse) structure is mastered orders of magnitude sooner than low-variance (fine) detail. Extending the analysis to deep linear networks and circulant full-width convolutions shows that weight sharing merely multiplies learning rates -- accelerating but not eliminating the bias -- whereas local convolution introduces a qualitatively different bias. Experiments on Gaussian and natural-image datasets confirm the spectral law persists in deep MLP-based UNet. Convolutional U-Nets, however, display rapid near-simultaneous emergence of many modes, implicating local convolution in reshaping learning dynamics. These results underscore how data covariance governs the order and speed with which diffusion models learn, and they call for deeper investigation of the unique inductive biases introduced by local convolution.

Meta Superintelligence Lab Releases Muse Spark: A Multimodal Reasoning Model With Thought Compression and Parallel Agents

MarkTechPost

Chatbots are great at manipulating people to buy stuff, Princeton boffins find

The Register

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

v0.20.5

Ollama Releases

Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos

Dev.to

An Analytical Theory of Spectral Bias in the Learning Dynamics of Diffusion Models

Key Points

Abstract

Related Articles

Meta Superintelligence Lab Releases Muse Spark: A Multimodal Reasoning Model With Thought Compression and Parallel Agents

Chatbots are great at manipulating people to buy stuff, Princeton boffins find

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

v0.20.5

Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer