Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection

arXiv stat.ML / 2026/3/24

💬 オピニオンIdeas & Deep AnalysisModels & Research

要点

  • The paper proposes an algorithmic framework to approximate finite-width/depth neural networks (including non-i.i.d. parameters) using a mixture of Gaussian processes rather than relying only on the infinite-width/depth GP equivalence limit.
  • It uses Wasserstein distance to measure similarity between the NN and the probabilistic GP models, iteratively approximating the output distribution of each network layer with provable approximation-error bounds.
  • The authors show that for any neural network and any ε>0, the method can produce a mixture of Gaussian processes that is ε-close to the NN at a finite set of input points.
  • By leveraging differentiability of the error bound, the approach can be used to tune neural network parameters to mimic a target Gaussian process, supporting tasks such as prior selection in Bayesian inference.
  • Experiments on regression and classification demonstrate the practical effectiveness of the approximations and suggest a route toward more formally quantified uncertainty in NN predictions.

Abstract

Infinitely wide or deep neural networks (NNs) with independent and identically distributed (i.i.d.) parameters have been shown to be equivalent to Gaussian processes. Because of the favorable properties of Gaussian processes, this equivalence is commonly employed to analyze neural networks and has led to various breakthroughs over the years. However, neural networks and Gaussian processes are equivalent only in the limit; in the finite case there are currently no methods available to approximate a trained neural network with a Gaussian model with bounds on the approximation error. In this work, we present an algorithmic framework to approximate a neural network of finite width and depth, and with not necessarily i.i.d. parameters, with a mixture of Gaussian processes with error bounds on the approximation error. In particular, we consider the Wasserstein distance to quantify the closeness between probabilistic models and, by relying on tools from optimal transport and Gaussian processes, we iteratively approximate the output distribution of each layer of the neural network as a mixture of Gaussian processes. Crucially, for any NN and \epsilon >0 our approach is able to return a mixture of Gaussian processes that is \epsilon-close to the NN at a finite set of input points. Furthermore, we rely on the differentiability of the resulting error bound to show how our approach can be employed to tune the parameters of a NN to mimic the functional behavior of a given Gaussian process, e.g., for prior selection in the context of Bayesian inference. We empirically investigate the effectiveness of our results on both regression and classification problems with various neural network architectures. Our experiments highlight how our results can represent an important step towards understanding neural network predictions and formally quantifying their uncertainty.