Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs
arXiv stat.ML / 4/15/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper analyzes, for orthogonal inputs and small initialization, the gradient flow behavior of one-hidden-layer ReLU neural networks trained with mean squared error (square loss).
- It provides a precise characterization showing that gradient flow converges to zero loss even though the training problem is non-convex.
- The authors characterize the network’s implicit bias, arguing that training favors a minimum variation norm solution among those reaching low loss.
- The study quantifies the “initial alignment” phenomenon and proves that training follows a particular saddle-to-saddle dynamical path.
Related Articles

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)
Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI
Dev.to

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else
Dev.to
Local LLM Beginner’s Guide (Mac - Apple Silicon)
Reddit r/artificial

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals
Dev.to