A Gaussian Process View on Observation Noise and Initialization in Wide Neural Networks

arXiv stat.ML / 4/2/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper reformulates gradient descent in wide neural networks as computing a posterior mean in a Gaussian Process using the Neural Tangent Kernel (NTK-GP), but notes that prior work assumes zero observation noise and a restricted prior mean.
  • It introduces a training regularizer that is shown to correspond to adding observation noise within the NTK-GP framework, improving model specification on noisy data.
  • To overcome limitations with arbitrary prior means, the authors propose a “shifted network” construction that supports any desired prior mean while still allowing posterior-mean estimation with gradient descent on a single network.
  • The approach is evaluated experimentally across multiple datasets and architectures, and the results indicate it removes major practical barriers to using the NTK-GP equivalence for applied Gaussian process modeling.

Abstract

Performing gradient descent in a wide neural network is equivalent to computing the posterior mean of a Gaussian Process with the Neural Tangent Kernel (NTK-GP), for a specific prior mean and with zero observation noise. However, existing formulations have two limitations: (i) the NTK-GP assumes noiseless targets, leading to misspecification on noisy data; (ii) the equivalence does not extend to arbitrary prior means, which are essential for well-specified models. To address (i), we introduce a regularizer into the training objective, showing its correspondence to incorporating observation noise in the NTK-GP. To address (ii), we propose a \textit{shifted network} that enables arbitrary prior means and allows obtaining the posterior mean with gradient descent on a single network, without ensembling or kernel inversion. We validate our results with experiments across datasets and architectures, showing that this approach removes key obstacles to the practical use of NTK-GP equivalence in applied Gaussian process modeling.

A Gaussian Process View on Observation Noise and Initialization in Wide Neural Networks | AI Navigate