On the role of memorization in learned priors for geophysical inverse problems

arXiv stat.ML / 3/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper analyzes memorization risks in learned priors trained on limited geophysical data for seismic inversion.
  • It shows that the posterior under a memorized prior behaves like a reweighted empirical distribution, effectively a likelihood-weighted lookup of training examples.
  • In diffusion models, memorization yields a Gaussian mixture prior, and linearizing the forward operator around training examples yields a Gaussian mixture posterior with widths and shifts governed by local Jacobians.
  • The authors validate these predictions on a stylized inverse problem and illustrate the practical consequences through diffusion posterior sampling for full waveform inversion.
  • These results highlight potential memorization effects when using data-driven priors in geophysics and suggest caution in training and applying such models with scarce data.

Abstract

Learned priors based on deep generative models offer data-driven regularization for seismic inversion, but training them requires a dataset of representative subsurface models -- a resource that is inherently scarce in geoscience applications. Since the training objective of most generative models can be cast as maximum likelihood on a finite dataset, any such model risks converging to the empirical distribution -- effectively memorizing the training examples rather than learning the underlying geological distribution. We show that the posterior under such a memorized prior reduces to a reweighted empirical distribution -- i.e., a likelihood-weighted lookup among the stored training examples. For diffusion models specifically, memorization yields a Gaussian mixture prior in closed form, and linearizing the forward operator around each training example gives a Gaussian mixture posterior whose components have widths and shifts governed by the local Jacobian. We validate these predictions on a stylized inverse problem and demonstrate the consequences of memorization through diffusion posterior sampling for full waveform inversion.