Three Creates All: You Only Sample 3 Steps

arXiv cs.LG / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • Diffusion models are still slow during inference because they require many sequential evaluations, and the paper identifies standard timestep conditioning as a key bottleneck for few-step sampling.
  • The proposed Multi-layer Time Embedding Optimization (MTEO) freezes the pretrained diffusion backbone and distills a small set of step-wise, layer-wise time embeddings from reference trajectories.
  • MTEO is designed to be plug-and-play with existing ODE solvers and claims to introduce no inference-time overhead while training only a tiny fraction of parameters.
  • Experiments across multiple datasets and diffusion backbones reportedly achieve state-of-the-art results for few-step sampling and reduce the performance gap versus lightweight distillation approaches.
  • The authors state that code will be made available for reproducibility and adoption.

Abstract

Diffusion models deliver high-fidelity generation but remain slow at inference time due to many sequential network evaluations. We find that standard timestep conditioning becomes a key bottleneck for few-step sampling. Motivated by layer-dependent denoising dynamics, we propose Multi-layer Time Embedding Optimization (MTEO), which freeze the pretrained diffusion backbone and distill a small set of step-wise, layer-wise time embeddings from reference trajectories. MTEO is plug-and-play with existing ODE solvers, adds no inference-time overhead, and trains only a tiny fraction of parameters. Extensive experiments across diverse datasets and backbones show state-of-the-art performance in the few-step sampling and substantially narrow the gap between distillation-based and lightweight methods. Code will be available.