SubFlow: Sub-mode Conditioned Flow Matching for Diverse One-Step Generation

arXiv cs.LG / 4/15/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper identifies that existing one-step flow-matching models can suffer severe diversity degradation because MSE-trained class-conditional flows effectively learn a frequency-weighted mean over intra-class sub-modes, averaging out rare but valid variations.
  • It proposes SubFlow (Sub-mode Conditioned Flow Matching), which decomposes each class into fine-grained semantic sub-modes via clustering and conditions the flow on sub-mode indices to avoid “averaging distortion.”
  • By making each conditioned sub-distribution approximately unimodal, SubFlow targets individual modes more accurately and restores fuller mode coverage even in a single inference step.
  • SubFlow is designed to be plug-and-play, integrating into existing one-step frameworks like MeanFlow and Shortcut Models without architectural changes.
  • Experiments on ImageNet-256 show improved generation diversity (higher Recall) while keeping competitive image quality (FID), demonstrating broad compatibility across one-step generation approaches.

Abstract

Flow matching has emerged as a powerful generative framework, with recent few-step methods achieving remarkable inference acceleration. However, we identify a critical yet overlooked limitation: these models suffer from severe diversity degradation, concentrating samples on dominant modes while neglecting rare but valid variations of the target distribution. We trace this degradation to averaging distortion: when trained with MSE objectives, class-conditional flows learn a frequency-weighted mean over intra-class sub-modes, causing the model to over-represent high-density modes while systematically neglecting low-density ones. To address this, we propose SubFlow, Sub-mode Conditioned Flow Matching, which eliminates averaging distortion by decomposing each class into fine-grained sub-modes via semantic clustering and conditioning the flow on sub-mode indices. Each conditioned sub-distribution is approximately unimodal, so the learned flow accurately targets individual modes with no averaging distortion, restoring full mode coverage in a single inference step. Crucially, SubFlow is entirely plug-and-play: it integrates seamlessly into existing one-step models such as MeanFlow and Shortcut Models without any architectural modifications. Extensive experiments on ImageNet-256 demonstrate that SubFlow yields substantial gains in generation diversity (Recall) while maintaining competitive image quality (FID), confirming its broad applicability across different one-step generation frameworks. Project page: https://yexionglin.github.io/subflow.