Overfitting and Generalizing with (PAC) Bayesian Prediction in Noisy Binary Classification

arXiv stat.ML / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • 提案された論文では、二値分類に対して「訓練誤差(事後分布によるランダム化予測)」と「事前分布とのKLダイバージェンス」をPAC-Bayes型に釣り合わせる学習則を扱い、(修正)MDL学習則とも対応付けています。
  • ゲインパラメータλ=1では(経験的)Bayes事後やprofile posteriorの形が回収される一方、リスク最小化の観点ではベイズ予測が過学習し、無条件(agnostic)設定で過剰損失が消えない可能性が示されます。
  • これを避けるためにλ≫1(サンプルサイズ依存の事前分布の解釈に相当)を選ぶと、agnosticケースでも一様に過剰損失が減少(uniformly vanishing)することが主張されています。
  • λの下での「過少正則化/過剰正則化」がどのような損失挙動を生むかを理論的に精密化し、過少正則化が抑制される領域と破局的になる領域を特徴づけています。

Abstract

We consider a PAC-Bayes type learning rule for binary classification, balancing the training error of a randomized ''posterior'' predictor with its KL divergence to a pre-specified ''prior''. This can be seen as an extension of a modified two-part-code Minimum Description Length (MDL) learning rule, to continuous priors and randomized predictions. With a balancing parameter of \lambda=1 this learning rule recovers an (empirical) Bayes posterior and a modified variant recovers the profile posterior, linking with standard Bayesian prediction (up to the treatment of the single-parameter noise level). However, from a risk-minimization prediction perspective, this Bayesian predictor overfits and can lead to non-vanishing excess loss in the agnostic case. Instead a choice of \lambda \gg 1, which can be seen as using a sample-size-dependent-prior, ensures uniformly vanishing excess loss even in the agnostic case. We precisely characterize the effect of under-regularizing (and over-regularizing) as a function of the balance parameter \lambda, understanding the regimes in which this under-regularization is tempered or catastrophic. This work extends previous work by Zhu and Srebro [2025] that considered only discrete priors to PAC Bayes type learning rules and, through their rigorous Bayesian interpretation, to Bayesian prediction more generally.