AI Navigate

Curriculum Sampling: A Two-Phase Curriculum for Efficient Training of Flow Matching

arXiv cs.LG / 3/16/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The authors show that the common static middle-biased timestep sampling in Flow Matching creates a speed-quality trade-off: faster early convergence but worse long-term fidelity compared to Uniform sampling.
  • They reveal a U-shaped training difficulty across timesteps, with persistent errors near the boundaries, meaning endpoints can’t be under-sampled without losing detail.
  • They propose Curriculum Sampling: a two-phase schedule that starts with middle-biased sampling to learn structure quickly, then switches to Uniform sampling for boundary refinement.
  • Empirical results on CIFAR-10 show FID improvement from 3.85 (Uniform baseline) to 3.22, and peak performance achieved earlier at 100k steps, illustrating faster, higher-quality training.
  • The broader implication is that timestep sampling should be treated as an evolving curriculum, not a fixed hyperparameter.

Abstract

Timestep sampling p(t) is a central design choice in Flow Matching models, yet common practice increasingly favors static middle-biased distributions (e.g., Logit-Normal). We show that this choice induces a speed--quality trade-off: middle-biased sampling accelerates early convergence but yields worse asymptotic fidelity than Uniform sampling. By analyzing per-timestep training losses, we identify a U-shaped difficulty profile with persistent errors near the boundary regimes, implying that under-sampling the endpoints leaves fine details unresolved. Guided by this insight, we propose \textbf{Curriculum Sampling}, a two-phase schedule that begins with middle-biased sampling for rapid structure learning and then switches to Uniform sampling for boundary refinement. On CIFAR-10, Curriculum Sampling improves the best FID from 3.85 (Uniform) to 3.22 while reaching peak performance at 100k rather than 150k training steps. Our results highlight that timestep sampling should be treated as an evolving curriculum rather than a fixed hyperparameter.