A Gaussian Process View on Observation Noise and Initialization in Wide Neural Networks
arXiv stat.ML / 4/2/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper reformulates gradient descent in wide neural networks as computing a posterior mean in a Gaussian Process using the Neural Tangent Kernel (NTK-GP), but notes that prior work assumes zero observation noise and a restricted prior mean.
- It introduces a training regularizer that is shown to correspond to adding observation noise within the NTK-GP framework, improving model specification on noisy data.
- To overcome limitations with arbitrary prior means, the authors propose a “shifted network” construction that supports any desired prior mean while still allowing posterior-mean estimation with gradient descent on a single network.
- The approach is evaluated experimentally across multiple datasets and architectures, and the results indicate it removes major practical barriers to using the NTK-GP equivalence for applied Gaussian process modeling.
Related Articles

Benchmarking Batch Deep Reinforcement Learning Algorithms
Dev.to

Qwen3.6-Plus: Alibaba's Quiet Giant in the AI Race Delivers a Million-Token Enterprise Powerhouse
Dev.to

How To Leverage AI for Back-Office Headcount Optimization
Dev.to
Is 1-bit and TurboQuant the future of OSS? A simulation for Qwen3.5 models.
Reddit r/LocalLLaMA
SOTA Language Models Under 14B?
Reddit r/LocalLLaMA