AI Navigate

Neural Galerkin Normalizing Flow for Transition Probability Density Functions of Diffusion Models

arXiv cs.LG / 3/20/2026

📰 NewsModels & Research

Key Points

  • The paper introduces a Neural Galerkin Normalizing Flow framework to approximate the transition probability density function of a diffusion process by solving the Fokker-Planck equation with an atomic initial distribution, parameterized by the initial mass location.
  • Normalizing Flows are used to express the solution as a transformation of the transition density of a reference stochastic process, ensuring positivity and mass conservation constraints.
  • The approach extends Neural Galerkin methods to Normalizing Flows and derives an ordinary differential equation (ODE) system for the time evolution of the flow parameters.
  • Adaptive sampling targets the Fokker-Planck residual in informative regions to address high-dimensional PDEs, enabling accurate capture of key solution features and causal relations between initial data and future densities.
  • After offline training, online evaluation becomes significantly cheaper than solving the PDE from scratch, positioning the method as a promising surrogate for many-query problems like Bayesian inference, simulation, and diffusion-bridge generation.

Abstract

We propose a new Neural Galerkin Normalizing Flow framework to approximate the transition probability density function of a diffusion process by solving the corresponding Fokker-Planck equation with an atomic initial distribution, parametrically with respect to the location of the initial mass. By using Normalizing Flows, we look for the solution as a transformation of the transition probability density function of a reference stochastic process, ensuring that our approximation is structure-preserving and automatically satisfies positivity and mass conservation constraints. By extending Neural Galerkin schemes to the context of Normalizing Flows, we derive a system of ODEs for the time evolution of the Normalizing Flow's parameters. Adaptive sampling routines are used to evaluate the Fokker-Planck residual in meaningful locations, which is of vital importance to address high-dimensional PDEs. Numerical results show that this strategy captures key features of the true solution and enforces the causal relationship between the initial datum and the density function at subsequent times. After completing an offline training phase, online evaluation becomes significantly more cost-effective than solving the PDE from scratch. The proposed method serves as a promising surrogate model, which could be deployed in many-query problems associated with stochastic differential equations, like Bayesian inference, simulation, and diffusion bridge generation.