Information-Theoretic Generalization Bounds for Stochastic Gradient Descent with Predictable Virtual Noise
arXiv cs.LG / 5/4/2026
💬 OpinionModels & Research
Key Points
- The paper derives information-theoretic generalization bounds for stochastic gradient descent (SGD) by relating expected generalization error to mutual information between learned parameters and training data.
- It improves prior “virtual noise” proof techniques by introducing predictable, history-adaptive virtual perturbations whose covariance can depend on past SGD history while remaining independent of current/future randomness.
- The new bounds use conditional relative-entropy/conditional Gaussian arguments, replacing fixed sensitivity and deviation terms with conditional adaptive versions and adding an output-sensitivity penalty from accumulated covariance.
- When adaptive covariance is data-dependent, the authors decouple local Gaussian smoothing from a global comparison to a reference kernel, adding a KL-based “covariance-comparison” cost for using an admissible but different reference geometry.
- Under certain admissible covariance synchronization rules, the framework recovers existing fixed-noise-style bounds, while also extending virtual perturbation analysis to SGD settings with path-dependent geometries without changing the SGD algorithm.
Related Articles
AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs
Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI
The Verge

CLMA Frame Test
Dev.to

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions
Dev.to

Roundtable chat with Talkie-1930 and Gemma 4 31B
Reddit r/LocalLLaMA