On the Memorization of Consistency Distillation for Diffusion Models

arXiv cs.LG / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper investigates how additional training via distillation changes the balance between memorization and generalization in diffusion models, using consistency distillation as the main example.
  • Experiments show that consistency distillation applied to a teacher model that has memorized data substantially reduces the memorization transferred to the student, while maintaining or even improving sample quality.
  • The authors provide a theoretical explanation grounded in a random feature neural network framework, arguing that distillation suppresses unstable feature directions linked to memorization.
  • The study concludes that distillation can function not only to speed up training or inference, but also to improve the memorization–generalization trade-off for more reliable deployment.

Abstract

Diffusion models are central to modern generative modeling, and understanding how they balance memorization and generalization is critical for reliable deployment. Recent work has shown that memorization in diffusion models is shaped by training dynamics, with generalization and memorization emerging at different stages of training. However, deployed diffusion models are often further distilled, introducing an additional training phase whose impact on memorization is not well understood. In this work, we analyze how distillation reshapes memorization behavior in diffusion models, taking consistency distillation as a representative framework. Empirically, we show that when applied to a teacher model that has memorized data, consistency distillation significantly reduces transferred memorization in the student while preserving, and sometimes improving, sample quality. To explain this behavior, we provide a theoretical analysis using a random feature neural network model [Bonnaire et al., 2025], showing that consistency distillation suppresses unstable feature directions associated with memorization while preserving stable, generalizable modes. Our findings suggest that distillation can serve not only as an acceleration tool, but also as a mechanism for improving the memorization-generalization trade-off.