Stepwise Variational Inference with Vine Copulas

arXiv stat.ML / 3/25/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a universal stepwise variational inference (VI) method that integrates vine copulas with a new stepwise estimation procedure for variational parameters.
  • Vine copulas are built as a nested sequence of trees, and the approach estimates the approximate posterior tree-by-tree along this vine structure to capture increasingly complex dependencies.
  • It argues that standard VI using backward Kullback–Leibler divergence can fail to recover correct vine copula parameters, so it defines the evidence lower bound using Rényi divergence instead.
  • The method includes an intuitive stopping criterion for adding additional vine trees, avoiding the need to pre-specify a complexity parameter for the variational distribution.
  • Experiments in applications such as sparse Gaussian processes suggest the approach is parameter-efficient and can outperform mean-field VI while interpolating toward full latent dependence.

Abstract

We propose stepwise variational inference (VI) with vine copulas: a universal VI procedure that combines vine copulas with a novel stepwise estimation procedure of the variational parameters. Vine copulas consist of a nested sequence of trees built from copulas, where more complex latent dependence can be modeled with increasing number of trees. We propose to estimate the vine copula approximate posterior in a stepwise fashion, tree by tree along the vine structure. Further, we show that the usual backward Kullback-Leibler divergence cannot recover the correct parameters in the vine copula model, thus the evidence lower bound is defined based on the R\'enyi divergence. Finally, an intuitive stopping criterion for adding further trees to the vine eliminates the need to pre-define a complexity parameter of the variational distribution, as required for most other approaches. Thus, our method interpolates between mean-field VI (MFVI) and full latent dependence. In many applications, in particular sparse Gaussian processes, our method is parsimonious with parameters, while outperforming MFVI.