Improving Infinitely Deep Bayesian Neural Networks with Nesterov's Accelerated Gradient Method

arXiv cs.LG / 3/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper targets SDE-based Bayesian neural networks, arguing that their use of numerical SDE solvers leads to high function-evaluation cost (NFEs) and can cause convergence instability.
  • It proposes an improved SDE-BNN architecture that incorporates Nesterov-accelerated gradient (NAG) together with an NFE-dependent residual skip connection.
  • The method is designed to accelerate convergence while substantially reducing NFEs during both training and inference.
  • Experiments across tasks such as image classification and sequence modeling reportedly show consistent performance gains over conventional SDE-BNNs, with both lower NFEs and higher predictive accuracy.
  • Overall, the work presents a practical optimization/architecture enhancement for Bayesian continuous-depth neural network models with a focus on computational efficiency and stability.

Abstract

As a representative continuous-depth neural network approach, stochastic differential equation (SDE)-based Bayesian neural networks (BNNs) have attracted considerable attention due to their solid theoretical foundations and strong potential for real-world applications. However, their reliance on numerical SDE solvers inevitably incurs a large number of function evaluations (NFEs), resulting in high computational cost and occasional convergence instability. To address these challenges, we propose a Nesterov-accelerated gradient (NAG) enhanced SDE-BNN model. By integrating NAG into the SDE-BNN framework along with an NFE-dependent residual skip connection, our method accelerates convergence and substantially reduces NFEs during both training and testing. Extensive empirical results show that our model consistently outperforms conventional SDE-BNNs across various tasks, including image classification and sequence modeling, achieving lower NFEs and improved predictive accuracy.

Improving Infinitely Deep Bayesian Neural Networks with Nesterov's Accelerated Gradient Method | AI Navigate