Generalization error bounds for two-layer neural networks with Lipschitz loss function

arXiv stat.ML / 4/9/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper derives generalization error bounds for training two-layer neural networks using Wasserstein-distance estimates between the true data distribution and its empirical measure.
It does not require the loss function to be bounded, instead relying on moment bounds tied to the stochastic gradient method.
For independent test data, the authors show a dimension-free generalization rate of order O(n^{-1/2}), where n is the sample size.
When independence is not assumed between training and test data, the bound degrades to O(n^{-1/(d_in+d_out)}) and depends on the input and output dimensions.
The resulting bounds (including coefficients) are computable before training and are supported by numerical simulations.

Abstract

We derive generalization error bounds for the training of two-layer neural networks without assuming boundedness of the loss function, using Wasserstein distance estimates on the discrepancy between a probability distribution and its associated empirical measure, together with moment bounds for the associated stochastic gradient method. In the case of independent test data, we obtain a dimension-free rate of order

O(n^{-1/2} )

on the

n

-sample generalization error, whereas without independence assumption, we derive a bound of order

O(n^{-1 / ( d_{\rm in}+d_{\rm out} )} )

, where

d_{\rm in}

d_{\rm out}

denote input and output dimensions. Our bounds and their coefficients can be explicitly computed prior to the training of the model, and are confirmed by numerical simulations.

Black Hat Asia

AI Business

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter

TechCrunch

Why Anthropic’s new model has cybersecurity experts rattled

Reddit r/artificial

Does the AI 2027 paper still hold any legitimacy?

Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)

Dev.to

Generalization error bounds for two-layer neural networks with Lipschitz loss function

Key Points

Abstract

Related Articles

Black Hat Asia

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter

Why Anthropic’s new model has cybersecurity experts rattled

Does the AI 2027 paper still hold any legitimacy?

Why Most Productivity Systems Fail (And What to Do Instead)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer