Convergence of projected stochastic natural gradient variational inference for various step size and sample or batch size schedules

arXiv stat.ML / 4/2/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies projected stochastic natural gradient variational inference (NGVI) under the assumption that variational distributions belong to an exponential family.
  • It proves new non-asymptotic convergence guarantees that cover different combinations of step-size schedules (constant or decreasing) and sample/batch-size schedules (constant or increasing).
  • With fixed hyperparameters, the method is shown to converge geometrically to a neighborhood of the optimum rather than necessarily reaching the exact optimum.
  • For other schedule combinations, the authors establish convergence to the optimum with rates of the form O(1/T^ρ), potentially achieving ρ≥1, depending on the step/sample/batch setup.
  • The results apply when the target posterior is “close” to the chosen exponential family, and they are positioned as a principled way to trade off speed, compute resources, and accuracy in NGVI.

Abstract

Stochastic natural gradient variational inference (NGVI) is a popular and efficient algorithm for Bayesian inference. Despite empirical success, the convergence of this method is still not fully understood. In this work, we define and study a projected stochastic NGVI when variational distributions form an exponential family. Stochasticity arises when either gradients are intractable expectations or large sums. We prove new non-asymptotic convergence results for combinations of constant or decreasing step sizes and constant or increasing sample/batch sizes. When all hyperparameters are fixed, NGVI is shown to converge geometrically to a neighborhood of the optimum, while we establish convergence to the optimum with rates of the form \mathcal{O}\left(\frac{1}{T^{\rho}} \right), possibly with \rho \geq 1, for all other combinations of step size and sample/batch size schedules. These rates apply when the target posterior distribution is close in some sense to the considered exponential family. Our theoretical results extend existing NGVI and stochastic optimization results and provide more flexibility to adjust, in a principled way, step sizes and sample/batch sizes in order to meet speed, resources, or accuracy constraints.