Inversion-Free Natural Gradient Descent on Riemannian Manifolds

arXiv stat.ML / 4/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces an inversion-free stochastic natural gradient descent method tailored to probability distributions with parameters defined on Riemannian manifolds, avoiding the usual Euclidean-space assumptions.
  • It formulates Fisher information intrinsically on the manifold and maintains an online approximation to the inverse Fisher matrix using successive score samples, updated with quadratic-cost steps that incorporate transport between different tangent spaces.
  • The method leverages manifold structure to implicitly satisfy constraints (e.g., positive definiteness and orthogonality) and to support desirable properties such as identifiability and geodesic convexity.
  • The authors provide almost-sure convergence-rate guarantees for optimization (and for the approximate Fisher information), including conditions on the step-size exponent and error terms arising from transport.
  • They also propose a limited-memory variant and empirically show improved effectiveness over Euclidean baselines in tasks including variational Bayes with Gaussian approximations and normalizing flows.

Abstract

The natural gradient method is widely used in statistical optimization, but its standard formulation assumes a Euclidean parameter space. This paper proposes an inversion-free stochastic natural gradient method for probability distributions whose parameters lie on a Riemannian manifold. The manifold setting offers several advantages: one can implicitly enforce parameter constraints such as positive definiteness and orthogonality, ensure parameters are identifiable, or guarantee regularity properties of the objective like geodesic convexity. Building on an intrinsic formulation of the Fisher information matrix (FIM) on a manifold, our method maintains an online approximation of the inverse FIM, which is efficiently updated at quadratic cost using score vectors sampled at successive iterates. In the Riemannian setting, these score vectors belong to different tangent spaces and must be combined using transport operations. We prove almost-sure convergence rates of O(\log{s}/s^\alpha) for the squared distance to the minimizer when the step size exponent \alpha >2/3. We also establish almost-sure rates for the approximate FIM, which now accumulates transport-based errors. A limited-memory variant of the algorithm with sub-quadratic storage complexity is proposed. Finally, we demonstrate the effectiveness of our method relative to its Euclidean counterparts on variational Bayes with Gaussian approximations and normalizing flows.