Singular Bayesian Neural Networks

arXiv stat.ML / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that standard Bayesian neural networks are often over-parameterized because mean-field Gaussian posteriors require O(mn) parameters, even when the true weight structure is effectively low-rank.
It proposes “singular” Bayesian neural networks by factorizing weight matrices as W = AB^T, producing a posterior concentrated on a rank-r manifold and capturing correlated weight structure via shared latent factors.
The authors derive PAC-Bayes generalization and loss bounds, with complexity scaling roughly as √(r(m+n)) rather than √(mn), and decompose the resulting error into optimization and rank-induced bias.
They adapt low-rank Gaussian complexity results to Bayesian predictive means and show empirically that the approach achieves competitive predictive performance with up to 33× fewer parameters than 5-member Deep Ensembles.
Experiments indicate improved out-of-distribution (OOD) detection and often better calibration than mean-field and perturbation baselines, though Deep Ensembles can remain stronger for in-distribution likelihood metrics.

Abstract

Bayesian neural networks promise calibrated uncertainty but require

O(mn)

parameters for standard mean-field Gaussian posteriors. We argue this cost is often unnecessary, particularly when weight matrices exhibit fast singular value decay. By parameterizing weights as

W = AB^{\top}

with

A \in \mathbb{R}^{m \times r}

B \in \mathbb{R}^{n \times r}

, we induce a posterior that is \emph{singular} with respect to the Lebesgue measure, concentrating on the rank-

r

manifold. This singularity captures structured weight correlations through shared latent factors, geometrically distinct from mean-field's independence assumption. We derive PAC-Bayes generalization bounds whose complexity term scales as

\sqrt{r(m+n)}

instead of

\sqrt{m n}

, and prove loss bounds that decompose the error into optimization and rank-induced bias using the Eckart-Young-Mirsky theorem. We further adapt recent Gaussian complexity bounds for low-rank deterministic networks to Bayesian predictive means. Empirically, across MLPs, LSTMs, and Transformers on standard benchmarks, our method achieves competitive predictive performance while using up to

33\times

fewer parameters than 5-member Deep Ensembles. It substantially improves OOD detection and often improves calibration relative to mean-field and perturbation baselines, while Deep Ensembles can still be stronger on in-distribution likelihood-based metrics.

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

Dev.to

How AI is Changing the Way We Code in 2026: The Shift from Syntax to Strategy

Dev.to

13 CLAUDE.md Rules That Make AI Write Modern PHP (Not PHP 5 Resurrected)

Dev.to

MCP annotations are a UX layer, not a security layer

Dev.to

From OOM to 262K Context: Running Qwen3-Coder 30B Locally on 8GB VRAM

Dev.to

Singular Bayesian Neural Networks

Key Points

Abstract

Related Articles

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

How AI is Changing the Way We Code in 2026: The Shift from Syntax to Strategy

13 CLAUDE.md Rules That Make AI Write Modern PHP (Not PHP 5 Resurrected)

MCP annotations are a UX layer, not a security layer

From OOM to 262K Context: Running Qwen3-Coder 30B Locally on 8GB VRAM

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer