A Review of Diffusion-based Simulation-Based Inference: Foundations and Applications in Non-Ideal Data Scenarios

arXiv stat.ML / 4/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper reviews diffusion-model-based simulation-based inference (SBI) as a likelihood-free approach to learn posterior distributions from simulator outputs when classical likelihoods are intractable.
  • It explains diffusion-based SBI fundamentals and positions them as addressing shortcomings of earlier neural SBI methods like neural likelihood/posterior estimation and normalizing flows.
  • The review focuses on robustness under three non-ideal data scenarios: simulator–reality model misspecification, irregular/infinite-dimensional observations, and missing data.
  • It surveys eight diffusion-based methods and highlights techniques such as conditional diffusion, guided diffusion, sequential/factorized designs for efficiency, and consistency models for faster sampling.
  • The article synthesizes theoretical conditions for accurate posterior recovery, then points to open problems and applications including geophysical uncertainty quantification.

Abstract

For complex simulation problems, inferring parameters often precludes the use of classical likelihood-based techniques due to intractable likelihoods. Simulation-based inference (SBI) methods offer a likelihood-free approach to directly learn posterior distributions p(\bftheta \mid \xobs) from simulator outputs. Recently, diffusion models have emerged as promising tools for SBI, addressing limitations of earlier neural methods such as neural likelihood/posterior estimation and normalizing flows. This review examines diffusion-based SBI from first principles to applications, emphasizing robustness in three non-ideal data scenarios common to scientific computing: model misspecification (simulator-reality mismatch), unstructured or infinite-dimensional observations, and missing data. We synthesize mathematical foundations and survey eight methods addressing these challenges, such as conditional diffusion for irregular data, guided diffusion for prior adaptation, sequential and factorized approaches for efficiency, and consistency models for fast sampling. Throughout, we maintain consistent notation and emphasize conditions required for accurate posteriors. We conclude with open problems and applications to geophysical uncertainty quantification, where these challenges are acute.