Closed-form conditional diffusion models for data assimilation

arXiv stat.ML / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes closed-form conditional diffusion models to perform data assimilation by learning and using the score function for conditional generation from measured data.
  • Instead of training neural networks to approximate the score, it exploits analytical tractability and uses kernel density estimation to efficiently evaluate the joint distribution of system states and corresponding measurements.
  • The method supports black-box scenarios, enabling data assimilation without explicit knowledge of the system dynamics or measurement process.
  • Experiments on nonlinear assimilation tasks using Lorenz-63 and Lorenz-96 (with nonlinear measurement models) show improved accuracy over ensemble Kalman and particle filters when ensemble sizes are small to moderate.
  • Overall, the approach combines diffusion-model strengths in representing complex, non-Gaussian distributions with improved efficiency and flexibility compared with widely used filtering techniques.

Abstract

We propose closed-form conditional diffusion models for data assimilation. Diffusion models use data to learn the score function (defined as the gradient of the log-probability density of a data distribution), allowing them to generate new samples from the data distribution by reversing a noise injection process. While it is common to train neural networks to approximate the score function, we leverage the analytical tractability of the score function to assimilate the states of a system with measurements. To enable the efficient evaluation of the score function, we use kernel density estimation to model the joint distribution of the states and their corresponding measurements. The proposed approach also inherits the capability of conditional diffusion models of operating in black-box settings, i.e., the proposed data assimilation approach can accommodate systems and measurement processes without their explicit knowledge. The ability to accommodate black-box systems combined with the superior capabilities of diffusion models in approximating complex, non-Gaussian probability distributions means that the proposed approach offers advantages over many widely used filtering methods. We evaluate the proposed method on nonlinear data assimilation problems based on the Lorenz-63 and Lorenz-96 systems of moderate dimensionality and nonlinear measurement models. Results show the proposed approach outperforms the widely used ensemble Kalman and particle filters when small to moderate ensemble sizes are used.