Convergence of projected stochastic natural gradient variational inference for various step size and sample or batch size schedules

arXiv stat.ML / 4/2/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies projected stochastic natural gradient variational inference (NGVI) under the assumption that variational distributions belong to an exponential family.
It proves new non-asymptotic convergence guarantees that cover different combinations of step-size schedules (constant or decreasing) and sample/batch-size schedules (constant or increasing).
With fixed hyperparameters, the method is shown to converge geometrically to a neighborhood of the optimum rather than necessarily reaching the exact optimum.
For other schedule combinations, the authors establish convergence to the optimum with rates of the form O(1/T^ρ), potentially achieving ρ≥1, depending on the step/sample/batch setup.
The results apply when the target posterior is “close” to the chosen exponential family, and they are positioned as a principled way to trade off speed, compute resources, and accuracy in NGVI.

Abstract

Stochastic natural gradient variational inference (NGVI) is a popular and efficient algorithm for Bayesian inference. Despite empirical success, the convergence of this method is still not fully understood. In this work, we define and study a projected stochastic NGVI when variational distributions form an exponential family. Stochasticity arises when either gradients are intractable expectations or large sums. We prove new non-asymptotic convergence results for combinations of constant or decreasing step sizes and constant or increasing sample/batch sizes. When all hyperparameters are fixed, NGVI is shown to converge geometrically to a neighborhood of the optimum, while we establish convergence to the optimum with rates of the form

\mathcal{O}\left(\frac{1}{T^{\rho}} \right)

, possibly with

\rho \geq 1

, for all other combinations of step size and sample/batch size schedules. These rates apply when the target posterior distribution is close in some sense to the considered exponential family. Our theoretical results extend existing NGVI and stochastic optimization results and provide more flexibility to adjust, in a principled way, step sizes and sample/batch sizes in order to meet speed, resources, or accuracy constraints.

v5.5.0

Transformers（HuggingFace）Releases

Bonsai (PrismML's 1 bit version of Qwen3 8B 4B 1.7B) was not an aprils fools joke

Reddit r/LocalLLaMA

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Inference Engines - A visual deep dive into the layers of an LLM

Dev.to

Surprised by how capable Qwen3.5 9B is in agentic flows (CodeMode)

Reddit r/LocalLLaMA

Convergence of projected stochastic natural gradient variational inference for various step size and sample or batch size schedules

Key Points

Abstract

Related Articles

v5.5.0

Bonsai (PrismML's 1 bit version of Qwen3 8B 4B 1.7B) was not an aprils fools joke

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Inference Engines - A visual deep dive into the layers of an LLM

Surprised by how capable Qwen3.5 9B is in agentic flows (CodeMode)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer