A unified perspective on fine-tuning and sampling with diffusion and flow models

arXiv stat.ML / 5/4/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies how to train diffusion and flow generative models to sample from target distributions formed via exponential tilting of a base density, covering both sampling from unnormalized targets and reward fine-tuning of pretrained models.
  • It proposes a unified framework that connects two viewpoints—stochastic optimal control (SOC) using adjoint/score-matching methods and non-equilibrium thermodynamics.
  • The authors provide bias–variance decompositions showing that Adjoint Matching/Sampling and Novel Score Matching have finite gradient variance, while Target and Conditional Score Matching can have non-finite variance.
  • They also derive norm bounds for the lean adjoint ODE, support the theoretical effectiveness of adjoint-based methods, and extend CMCD/NETS losses plus Crooks and Jarzynski identities to the exponential-tilting setting.
  • Experimental validation is provided via reward fine-tuning on Stable Diffusion 1.5 and Stable Diffusion 3, demonstrating the practical relevance of the theoretical results.

Abstract

We study the problem of training diffusion and flow generative models to sample from target distributions defined by an exponential tilting of a base density; a formulation that subsumes both sampling from unnormalized densities and reward fine-tuning of pre-trained models. This problem can be approached from a stochastic optimal control (SOC) perspective, using adjoint-based or score matching methods, or from a non-equilibrium thermodynamics perspective. We provide a unified framework encompassing these approaches and make three main contributions: (i) bias-variance decompositions revealing that Adjoint Matching/Sampling and Novel Score Matching have finite gradient variance, while Target and Conditional Score Matching do not; (ii) norm bounds on the lean adjoint ODE that theoretically support the effectiveness of adjoint-based methods; and (iii) adaptations of the CMCD and NETS loss functions, along with novel Crooks and Jarzynski identities, to the exponential tilting setting. We validate our analysis with reward fine-tuning experiments on Stable Diffusion 1.5 and 3.