Non-Asymptotic Convergence of Discrete Diffusion Models: Masked and Random Walk dynamics

arXiv stat.ML / 4/2/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses a gap in theory for diffusion models on discrete state spaces, which are less understood than their continuous (Gaussian noising) counterparts due to combinatorial and discrete-specific challenges.
  • It derives sharp non-asymptotic convergence guarantees for three discrete diffusion models, including finite-state random-walk and masking dynamics and a countably infinite drifted random-walk process on \(\mathbb{N}^d\).
  • Even assuming access to the ideal discrete score function, the authors show exact backward simulation is infeasible and instead analyze time-discretized Euler-type approximations.
  • The convergence is bounded in both Kullback-Leibler divergence and total variation distance under minimal assumptions on the data distribution, avoiding boundedness requirements on the estimated score.
  • The authors claim computational complexity that scales linearly with dimension (up to logarithmic factors), making the theoretical results practically relevant for high-dimensional discrete settings.

Abstract

Diffusion models for continuous state spaces based on Gaussian noising processes are now relatively well understood from both practical and theoretical perspectives. In contrast, results for diffusion models on discrete state spaces remain far less explored and pose significant challenges, particularly due to their combinatorial structure and their more recent introduction in generative modelling. In this work, we establish new and sharp convergence guarantees for three popular discrete diffusion models (DDMs). Two of these models are designed for finite state spaces and are based respectively on the random walk and the masking process. The third DDM we consider is defined on the countably infinite space \mathbb{N}^d and uses a drifted random walk as its forward process. For each of these models, the backward process can be characterized by a discrete score function that can, in principle, be estimated. However, even with perfect access to these scores, simulating the exact backward process is infeasible, and one must rely on time discretization. In this work, we study Euler-type approximations and establish convergence bounds in both Kullback-Leibler divergence and total variation distance for the resulting models, under minimal assumptions on the data distribution. To the best of our knowledge, this study provides the optimal non-asymptotic convergence guarantees for these noising processes that do not rely on boundedness assumptions on the estimated score. In particular, the computational complexity of each method scales only linearly in the dimension, up to logarithmic factors.