On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD
arXiv cs.LG / 3/12/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The authors study SGD with label noise on a two-layer over-parameterized linear network to understand its implicit bias and generalization behavior.
- They uncover a two-phase learning dynamic: Phase I where weights shrink and the model escapes the lazy regime, and Phase II where alignment with the ground-truth interpolator increases toward convergence.
- The analysis highlights label noise as a key driver for the transition from lazy to rich regimes and provides a minimal explanation for its empirical effectiveness.
- They extend the insights to Sharpness-Aware Minimization (SAM) and validate the theory with extensive experiments on synthetic and real-world data, with code released.
Related Articles
I Was Wrong About AI Coding Assistants. Here's What Changed My Mind (and What I Built About It).
Dev.to

Interesting loop
Reddit r/LocalLLaMA
Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants
Reddit r/LocalLLaMA
A supervisor or "manager" Al agent is the wrong way to control Al
Reddit r/artificial
FeatherOps: Fast fp8 matmul on RDNA3 without native fp8
Reddit r/LocalLLaMA