The Polynomial Stein Discrepancy for Assessing Moment Convergence

arXiv stat.ML / 5/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces the Polynomial Stein Discrepancy (PSD) as a way to measure how different a set of samples is from a target posterior distribution in Bayesian inference.
  • It argues that common diagnostics like effective sample size can be unreliable for scalable Bayesian samplers such as stochastic gradient Langevin dynamics, which can be asymptotically biased.
  • It motivates PSD by contrasting it with Kernel Stein Discrepancy (KSD), noting that KSD is expensive due to quadratic scaling and can be sensitive to dimensionality and hyperparameter tuning.
  • The authors prove the proposed goodness-of-fit test can detect differences in the first r moments for Gaussian targets, though it is not fully convergence-determining.
  • Experiments indicate the new test is more powerful than competing methods in several settings, with lower computational cost, and it can help practitioners choose hyperparameters more efficiently.

Abstract

We propose a novel method for measuring the discrepancy between a set of samples and a desired posterior distribution for Bayesian inference. Classical methods for assessing sample quality like the effective sample size are not appropriate for scalable Bayesian sampling algorithms, such as stochastic gradient Langevin dynamics, that are asymptotically biased. Instead, the gold standard is to use the kernel Stein Discrepancy (KSD), which is itself not scalable given its quadratic cost in the number of samples. The KSD and its faster extensions also typically suffer from the curse of dimensionality and can require extensive tuning. To address these limitations, we develop the polynomial Stein discrepancy (PSD) and an associated goodness-of-fit test. While the new test is not fully convergence-determining, we prove that it detects differences in the first r moments for Gaussian targets. We empirically show that the test has higher power than its competitors in several examples, and at a lower computational cost. Finally, we demonstrate that the PSD can assist practitioners to select hyper-parameters of Bayesian sampling algorithms more efficiently than competitors.