Diffusion Model for Manifold Data: Score Decomposition, Curvature, and Statistical Complexity

arXiv cs.LG / 2026/3/24

💬 オピニオンIdeas & Deep AnalysisModels & Research

要点

  • The paper advances theoretical understanding of diffusion models for high-dimensional data that effectively lies on a lower-dimensional smooth Riemannian manifold.
  • It analyzes how diffusion models decompose the score function across different noise injection regimes and how manifold curvature shapes that score structure.
  • By leveraging these geometric insights, the authors argue for an efficient neural network approximation of the score function.
  • The work derives statistical rates for both score estimation and distribution learning, showing that performance depends on intrinsic data dimension and manifold curvature.
  • Overall, the study aims to bridge diffusion-model theory with practical learning behavior for generative modeling on manifold-structured data.

Abstract

Diffusion models have become a leading framework in generative modeling, yet their theoretical understanding -- especially for high-dimensional data concentrated on low-dimensional structures -- remains incomplete. This paper investigates how diffusion models learn such structured data, focusing on two key aspects: statistical complexity and influence of data geometric properties. By modeling data as samples from a smooth Riemannian manifold, our analysis reveals crucial decompositions of score functions in diffusion models under different levels of injected noise. We also highlight the interplay of manifold curvature with the structures in the score function. These analyses enable an efficient neural network approximation to the score function, built upon which we further provide statistical rates for score estimation and distribution learning. Remarkably, the obtained statistical rates are governed by the intrinsic dimension of data and the manifold curvature. These results advance the statistical foundations of diffusion models, bridging theory and practice for generative modeling on manifolds.