1.x-Distill: Breaking the Diversity, Quality, and Efficiency Barrier in Distribution Matching Distillation

arXiv cs.CV / 4/7/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • 1.x-Distillは、Diffusionモデルの反復的なノイズ除去コストを抑えるための「小ステップ(fractional-step)蒸留」枠組みで、従来の整数ステップ制約を破って実用的な1.x-step生成を目指す手法だ。
  • 分布マッチング蒸留(DMD)が2ステップ以下で起こしがちな多様性崩壊や忠実度低下に対し、教師側CFGの役割を分析したうえでモード崩壊を抑える修正を提案している。
  • 極端に少ないステップ条件での性能向上として、粗い構造を多様性保持の分布マッチングで学び、詳細を推論整合的な敵対的蒸留で磨く「Stagewise Focused Distillation」の二段階戦略を導入する。
  • さらにDistill–Cache co-Training向けに軽量な補償モジュールを設計し、ブロック単位のキャッシュを蒸留パイプラインへ自然に組み込めるようにしている。
  • 実験ではSD3-Medium/SD3.5-Largeで、先行するfew-step蒸留より品質と多様性が改善し、実効NFEがそれぞれ1.67/1.74で最大33倍の速度向上(元の28×2 NFE比)を報告している。

Abstract

Diffusion models produce high-quality text-to-image results, but their iterative denoising is computationally expensive.Distribution Matching Distillation (DMD) emerges as a promising path to few-step distillation, but suffers from diversity collapse and fidelity degradation when reduced to two steps or fewer. We present 1.x-Distill, the first fractional-step distillation framework that breaks the integer-step constraint of prior few-step methods and establishes 1.x-step generation as a practical regime for distilled diffusion models.Specifically, we first analyze the overlooked role of teacher CFG in DMD and introduce a simple yet effective modification to suppress mode collapse. Then, to improve performance under extreme steps, we introduce Stagewise Focused Distillation, a two-stage strategy that learns coarse structure through diversity-preserving distribution matching and refines details with inference-consistent adversarial distillation. Furthermore, we design a lightweight compensation module for Distill--Cache co-Training, which naturally incorporates block-level caching into our distillation pipeline.Experiments on SD3-Medium and SD3.5-Large show that 1.x-Distill surpasses prior few-step methods, achieving better quality and diversity at 1.67 and 1.74 effective NFEs, respectively, with up to 33x speedup over original 28x2 NFE sampling.