Posterior Augmented Flow Matching

arXiv cs.CV / 5/4/2026

📰 NewsModels & Research

Key Points

  • Flow matching (FM) can become sparse and high-variance on high-dimensional image data because each sample supervises only one trajectory, which may lead to flow collapse and poor generalization.
  • The paper proposes Posterior-Augmented Flow Matching (PAFM), replacing single-target supervision with an expectation over valid target completions given an intermediate state and condition.
  • PAFM factorizes the intractable posterior into a likelihood term and a conditional prior term, then uses importance sampling to form a mixture over multiple candidate targets.
  • The authors prove PAFM provides an unbiased estimator of the original FM objective and significantly reduces gradient variance during training.
  • Experiments show PAFM improves generation quality by up to 3.4 FID50K across multiple model scales, architectures, and class/text-conditioned benchmarks, with negligible extra compute overhead, and releases code on GitHub.

Abstract

Flow matching (FM) trains a time-dependent vector field that transports samples from a simple prior to a complex data distribution. However, for high-dimensional images, each training sample supervises only a single trajectory and intermediate point, yielding an extremely sparse and high-variance training signal. This under-constrained supervision can cause flow collapse, where the learned dynamics memorize specific source-target pairings, mapping diverse inputs to overly similar outputs, failing to generalize. We introduce Posterior-Augmented Flow Matching (PAFM), a theoretically grounded generalization of FM that replaces single-target supervision with an expectation over an approximate posterior of valid target completions for a given intermediate state and condition. PAFM factorizes this intractable posterior into (i) the likelihood of the intermediate under a hypothesized endpoint and (ii) the prior probability of that endpoint under the condition, and uses an importance sampling scheme to construct a mixture over multiple candidate targets. We prove that PAFM yields an unbiased estimator of the original FM objective while substantially reducing gradient variance during training by aggregating information from many plausible continuation trajectories per intermediate. Finally, we show that PAFM improves over FM by up to 3.4 FID50K across different model scales (SiT-B/2 and SiT-XL/2), different architectures (SiT and MMDiT), and in both class and text conditioned benchmarks (ImageNet and CC12M), with a negligible increase in the compute overhead. Code: https://github.com/gstoica27/PAFM.git.

Posterior Augmented Flow Matching | AI Navigate