Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD
arXiv cs.LG / 3/23/2026
📰 NewsModels & Research
Key Points
- Discrete Moment Matching Distillation (D-MMD) is proposed to tackle the challenge of distilling discrete diffusion models.
- The method adapts successful ideas from continuous diffusion distillation to maintain high quality and diversity when sampling with a sufficient number of steps.
- It is demonstrated on text and image datasets, with newly distilled generators outperforming their teacher models.
- The work contributes a new approach to discrete diffusion model distillation and is released as an arXiv preprint (2603.20155v1).
Related Articles
[D] Matryoshka Representation Learning
Reddit r/MachineLearning
Two new Qwen3.5 “Neo” fine‑tunes focused on fast, efficient reasoning
Reddit r/LocalLLaMA

HKIC, Gobi Partners and HKU team up for fund backing university research start-ups
SCMP Tech
Yann LeCun’s New LeWorldModel (LeWM) Research Targets JEPA Collapse in Pixel-Based Predictive World Modeling
MarkTechPost
Streaming experts
Simon Willison's Blog