Generative models on phase space

arXiv cs.AI / 4/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies how deep generative models (notably diffusion and flow matching) can learn and sample high-dimensional distributions when data lies on a physically constrained submanifold in embedding space.
  • For high-energy physics events represented as relativistic energy-momentum 4-vectors, it argues that approximate enforcement of physical laws can hurt interpretability and reliability.
  • It proposes generative models that are constructed to remain, at every sampling step, on the manifold of massless N-particle Lorentz-invariant phase space in the center-of-momentum frame.
  • For diffusion models, it shows that the forward “pure noise” endpoint corresponds to a uniform distribution on phase space, offering a principled baseline for analyzing how particle correlations emerge during reverse denoising.
  • The authors demonstrate learning of few-particle and many-particle distributions with different singularity structures and position the work for future interpretability studies on simulated jet data.

Abstract

Deep generative models such as diffusion and flow matching are powerful machine learning tools capable of learning and sampling from high-dimensional distributions. They are particularly useful when the training data appears to be concentrated on a submanifold of the data embedding space. For high-energy physics data, consisting of collections of relativistic energy-momentum 4-vectors, this submanifold can enforce extremely strong physically-motivated priors, such as energy and momentum conservation. If these constraints are learned only approximately, rather than exactly, this can inhibit the interpretability and reliability of such generative models. To remedy this deficiency, we introduce generative models which are, by construction, confined at every step of their sampling trajectory to the manifold of massless N-particle Lorentz-invariant phase space in the center-of-momentum frame. In the case of diffusion models, the "pure noise" forward process endpoint corresponds to the uniform distribution on phase space, which provides a clear starting point from which to identify how correlations among the particles emerge during the reverse (de-noising) process. We demonstrate that our models are able to learn both few-particle and many-particle distributions with various singularity structures, paving the way for future interpretability studies using generative models trained on simulated jet data.