Occam's Razor is Only as Sharp as Your ELBO

arXiv cs.LG / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper frames the marginal likelihood (“evidence”) as a mathematical version of Occam’s razor for model selection that helps prevent overfitting.
  • It shows that using ELBO-based objectives for hyperparameter learning can both underfit and overfit, depending on assumptions about the approximate posterior—specifically the covariance rank in a Gaussian approximation.
  • In an over-parameterized regression setting, Bayesian model selection using the evidence may sometimes choose the overfit solution, even when the ELBO-based method does not.
  • The authors warn that practitioners scaling to large models should carefully consider how reduced-rank/tractability assumptions for variational inference may distort or impair reliable model selection.

Abstract

The marginal likelihood, also known as the evidence, is regarded as a mathematical embodiment of Occam's razor, enabling model selection that avoids overfitting. The evidence lower bound (ELBO) objective from variational inference has also been used for similar purposes. Prior work has shown that restricting the approximate posterior family via a mean-field approximation can lead the ELBO to underfit. In this paper, we show how ELBO-based hyperparameter learning in a simple over-parameterized regression model can also produce overfitting, depending on the assumed rank of the covariance matrix in a Gaussian approximate posterior. Surprisingly, among only the underfit and overfit options, Bayesian model selection via the evidence itself sometimes prefers the overfit version, while the ELBO does not. Bayesian practitioners hoping to scale to large models should be cautious about how reduced-rank assumptions needed for tractability may impact the potential for model selection.