Warm-Start Flow Matching for Guaranteed Fast Text/Image Generation

arXiv cs.LG / 3/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • This paper proposes Warm-Start Flow Matching (WS-FM), a method that speeds up sample generation for flow matching-based generative models by using lightweight draft samples as the initial distribution.
  • By starting the flow matching process closer to the target time rather than from pure noise, WS-FM guarantees a significant speed-up in the number of time steps without compromising sample quality.
  • The approach is described as a learning-to-refine paradigm, transforming low-quality drafts into high-quality samples.
  • Experiments on synthetic toy data and real-world text and image generation tasks demonstrate guaranteed speed-up while preserving output quality.

Abstract

Current auto-regressive (AR) LLMs, diffusion-based text/image generative models, and recent flow matching (FM) algorithms are capable of generating premium quality text/image samples. However, the inference or sample generation in these models is often very time-consuming and computationally demanding, mainly due to large numbers of function evaluations corresponding to the lengths of tokens or the numbers of diffusion steps. This also necessitates heavy GPU resources, time, and electricity. In this work we propose a novel solution to reduce the sample generation time of flow matching algorithms by a guaranteed speed-up factor, without sacrificing the quality of the generated samples. Our key idea is to utilize computationally lightweight generative models whose generation time is negligible compared to that of the target AR/FM models. The draft samples from a lightweight model, whose quality is not satisfactory but fast to generate, are regarded as an initial distribution for a FM algorithm. Unlike conventional usage of FM that takes a pure noise (e.g., Gaussian or uniform) initial distribution, the draft samples are already of decent quality, so we can set the starting time to be closer to the end time rather than 0 in the pure noise FM case. This will significantly reduce the number of time steps to reach the target data distribution, and the speed-up factor is guaranteed. Our idea, dubbed {\em Warm-Start FM} or WS-FM, can essentially be seen as a {\em learning-to-refine} generative model from low-quality draft samples to high-quality samples. As a proof of concept, we demonstrate the idea on some synthetic toy data as well as real-world text and image generation tasks, illustrating that our idea offers guaranteed speed-up in sample generation without sacrificing the quality of the generated samples.