Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data

arXiv stat.ML / 4/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses a gap in statistical theory for score-based diffusion models by deriving finite-sample error bounds for learning an unknown data distribution from limited samples.
  • It improves on earlier analyses by accounting for the intrinsic low-dimensional structure of real data, yielding convergence rates governed by a new (p,q)-Wasserstein dimension rather than ambient dimension.
  • Under mild assumptions on the forward diffusion process and the data distribution, the authors prove Wasserstein-p generalization guarantees for all p ≥ 1 using only a finite-moment condition (no compact support, manifold, or smooth-density requirements).
  • They show that the expected Wasserstein-p error scales as roughly \(\widetilde{O}(n^{-1 / d^\ast_{p,q}(\mu)})\), demonstrating natural adaptation to data geometry and mitigation of the curse of dimensionality.
  • The work also connects diffusion-model analysis to GAN theory and to sharp minimax rates from optimal transport, and extends Wasserstein dimension concepts to unbounded-support distributions.

Abstract

Despite the remarkable empirical success of score-based diffusion models, their statistical guarantees remain underdeveloped. Existing analyses often provide pessimistic convergence rates that do not reflect the intrinsic low-dimensional structure common in real data, such as that arising in natural images. In this work, we study the statistical convergence of score-based diffusion models for learning an unknown distribution \mu from finitely many samples. Under mild regularity conditions on the forward diffusion process and the data distribution, we derive finite-sample error bounds on the learned generative distribution, measured in the Wasserstein-p distance. Unlike prior results, our guarantees hold for all p \ge 1 and require only a finite-moment assumption on \mu, without compact-support, manifold, or smooth-density conditions. Specifically, given n i.i.d.\ samples from \mu with finite q-th moment and appropriately chosen network architectures, hyperparameters, and discretization schemes, we show that the expected Wasserstein-p error between the learned distribution \hat{\mu} and \mu scales as \mathbb{E}\, \mathbb{W}_p(\hat{\mu},\mu) = \widetilde{O}\!\left(n^{-1 / d^\ast_{p,q}(\mu)}\right), where d^\ast_{p,q}(\mu) is the (p,q)-Wasserstein dimension of \mu. Our results demonstrate that diffusion models naturally adapt to the intrinsic geometry of data and mitigate the curse of dimensionality, since the convergence rate depends on d^\ast_{p,q}(\mu) rather than the ambient dimension. Moreover, our theory conceptually bridges the analysis of diffusion models with that of GANs and the sharp minimax rates established in optimal transport. The proposed (p,q)-Wasserstein dimension also extends the notion of classical Wasserstein dimension to distributions with unbounded support, which may be of independent theoretical interest.