Zeroth-Order Optimization at the Edge of Stability
arXiv cs.LG / 4/17/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies zeroth-order (ZO) optimization methods using a two-point gradient-free estimator and derives an explicit step-size condition for mean-square linear stability.
- It shows a key difference from first-order (FO) optimization: FO stability depends mainly on the largest Hessian eigenvalue, while ZO stability is influenced by the entire Hessian spectrum.
- Because full Hessian eigenspectrum computation is impractical, the authors provide practical stability bounds that require only the largest eigenvalue and the Hessian trace.
- Experiments indicate that several full-batch ZO methods (ZO-GD, ZO-GDM, and ZO-Adam) tend to run near the predicted “edge of stability” boundary across multiple deep-learning training tasks.
- The findings suggest ZO methods have an implicit regularization effect where large step sizes mainly regularize the Hessian trace (unlike FO methods, which target the top eigenvalue).
Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

10 ChatGPT Prompts Every Genetic Counselor Should Be Using in 2025
Dev.to

The Memory Wall Can't Be Killed — 3 Papers Proving Every Architecture Hits It
Dev.to

BlueColumn vs Mem0: Which AI Agent Memory API Should You Use?
Dev.to

The Physics Wall in 2026: 3 Papers That Show Why Node Shrinks Won't Save Us
Dev.to