Accelerated Parallel Tempering via Neural Transports

arXiv stat.ML / 3/26/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses limitations of Parallel Tempering (PT) MCMC, where performance degrades when adjacent tempered distributions have little overlap in high-dimensional, multimodal targets.
  • It proposes accelerating PT by integrating neural samplers—such as normalizing flows and diffusion models—to effectively enlarge the overlap between adjacent distributions.
  • The framework uses neural samplers in parallel while aiming to avoid the full computational burden typically associated with running neural samplers, yet still maintains PT’s asymptotic consistency.
  • The authors provide theoretical and empirical evidence that the method improves sample quality, lowers computational cost relative to classical PT, and supports efficient estimation of free energies/normalizing constants.

Abstract

Markov Chain Monte Carlo (MCMC) algorithms are essential tools in computational statistics for sampling from unnormalised probability distributions, but can be fragile when targeting high-dimensional, multimodal, or complex target distributions. Parallel Tempering (PT) enhances MCMC's sample efficiency through annealing and parallel computation, propagating samples from tractable reference distributions to intractable targets via state swapping across interpolating distributions. The effectiveness of PT is limited by the often minimal overlap between adjacent distributions in challenging problems, which requires increasing the computational resources to compensate. We introduce a framework that accelerates PT by leveraging neural samplers -- including normalising flows, diffusion models, and controlled diffusions -- to reduce the required overlap. Our approach utilises neural samplers in parallel, circumventing the computational burden of neural samplers while preserving the asymptotic consistency of classical PT. We demonstrate theoretically and empirically on a variety of multimodal sampling problems that our method improves sample quality, reduces the computational cost compared to classical PT, and enables efficient free energy/normalising constant estimation.