An Analytical Theory of Spectral Bias in the Learning Dynamics of Diffusion Models

arXiv stat.ML / 4/7/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes an analytical framework describing how diffusion models’ generated distributions evolve during training, including closed-form results for the learning dynamics of linear and convolutional denoisers.
  • It derives a universal “inverse-variance spectral law” stating that a mode’s time to reach its target variance scales as \(\tau\propto\lambda^{-1}\), meaning coarse/high-variance structure is learned much faster than fine/low-variance detail.
  • The analysis shows that weight sharing (e.g., via convolution with global structure assumptions) primarily accelerates learning by effectively scaling learning rates, but does not remove the underlying spectral bias.
  • It finds that local convolution changes the learning dynamics qualitatively, with convolutional U-Nets exhibiting near-simultaneous emergence of many modes, unlike deep MLP-based UNets.
  • Experiments on both synthetic (Gaussian) and natural-image datasets support the persistence of the spectral law while highlighting architecture-dependent deviations driven by local convolution’s inductive bias.

Abstract

We develop an analytical framework for understanding how the generated distribution evolves during diffusion model training. Leveraging a Gaussian-equivalence principle, we solve the full-batch gradient-flow dynamics of linear and convolutional denoisers and integrate the resulting probability-flow ODE, yielding analytic expressions for the generated distribution. The theory exposes a universal inverse-variance spectral law: the time for an eigen- or Fourier mode to match its target variance scales as \tau\propto\lambda^{-1}, so high-variance (coarse) structure is mastered orders of magnitude sooner than low-variance (fine) detail. Extending the analysis to deep linear networks and circulant full-width convolutions shows that weight sharing merely multiplies learning rates -- accelerating but not eliminating the bias -- whereas local convolution introduces a qualitatively different bias. Experiments on Gaussian and natural-image datasets confirm the spectral law persists in deep MLP-based UNet. Convolutional U-Nets, however, display rapid near-simultaneous emergence of many modes, implicating local convolution in reshaping learning dynamics. These results underscore how data covariance governs the order and speed with which diffusion models learn, and they call for deeper investigation of the unique inductive biases introduced by local convolution.