Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD

arXiv cs.LG / 3/23/2026

📰 NewsModels & Research

Key Points

  • Discrete Moment Matching Distillation (D-MMD) is proposed to tackle the challenge of distilling discrete diffusion models.
  • The method adapts successful ideas from continuous diffusion distillation to maintain high quality and diversity when sampling with a sufficient number of steps.
  • It is demonstrated on text and image datasets, with newly distilled generators outperforming their teacher models.
  • The work contributes a new approach to discrete diffusion model distillation and is released as an arXiv preprint (2603.20155v1).

Abstract

It is currently difficult to distill discrete diffusion models. In contrast, continuous diffusion literature has many distillation approaches methods that can reduce sampling steps to a handful. Our method, Discrete Moment Matching Distillation (D-MMD), leverages ideas that have been highly successful in the continuous domain. Whereas previous discrete distillation methods collapse, D-MMD maintains high quality and diversity (given sufficient sampling steps). This is demonstrated on both text and image datasets. Moreover, the newly distilled generators can outperform their teachers.