Generalization error bounds for two-layer neural networks with Lipschitz loss function

arXiv stat.ML / 4/9/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper derives generalization error bounds for training two-layer neural networks using Wasserstein-distance estimates between the true data distribution and its empirical measure.
  • It does not require the loss function to be bounded, instead relying on moment bounds tied to the stochastic gradient method.
  • For independent test data, the authors show a dimension-free generalization rate of order O(n^{-1/2}), where n is the sample size.
  • When independence is not assumed between training and test data, the bound degrades to O(n^{-1/(d_in+d_out)}) and depends on the input and output dimensions.
  • The resulting bounds (including coefficients) are computable before training and are supported by numerical simulations.

Abstract

We derive generalization error bounds for the training of two-layer neural networks without assuming boundedness of the loss function, using Wasserstein distance estimates on the discrepancy between a probability distribution and its associated empirical measure, together with moment bounds for the associated stochastic gradient method. In the case of independent test data, we obtain a dimension-free rate of order O(n^{-1/2} ) on the n-sample generalization error, whereas without independence assumption, we derive a bound of order O(n^{-1 / ( d_{\rm in}+d_{\rm out} )} ), where d_{\rm in}, d_{\rm out} denote input and output dimensions. Our bounds and their coefficients can be explicitly computed prior to the training of the model, and are confirmed by numerical simulations.