A Tale of Two Variances: When Single-Seed Benchmarks Fail in Bayesian Deep Learning
arXiv cs.LG / 4/28/2026
📰 NewsModels & Research
Key Points
- In small-data Bayesian deep learning, reporting a single-seed evaluation metric value (e.g., CRPS) as if it were deterministic can be misleading because the metric’s endpoint is a random variable.
- Across 50 independent runs on six regression datasets, CRPS variance trajectories vary widely by method: MAP and Deep Ensembles can show reproducible variance peaks at intermediate training sizes, while MC Dropout and Bayes by Backprop typically exhibit smooth variance decay.
- These variance peaks materially affect reliability—for example, on the Seoul Bike dataset the relative RMSE of a single-seed MAP estimate reaches 93.6%, and the chance of landing within ±10% of the repeated-run mean falls to 5.9%.
- Local CRPS variance is shown to be a strong signal of single-seed estimation error (Spearman correlation >0.96 across real datasets), and improving the training objective by using β-NLL substantially reduces the irregular variance behavior.
Related Articles
v0.22.1
Ollama Releases

The best of Cloud Next '26: Gemini Enterprise Agent Platform. The perfect combination of Intelligence and Automation to generate VALUE.
Dev.to

Open source memory layer so any AI agent can do what Claude.ai and ChatGPT do
Dev.to

Sources: Anthropic could raise a new $50B round at a valuation of $900B
TechCrunch

Satya Nadella says he’s ready to ‘exploit’ the new OpenAI deal
TechCrunch