Learning Dynamics of Zeroth-Order Optimization: A Kernel Perspective
arXiv cs.LG / 5/6/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses the known theory that zeroth-order (ZO) optimization typically slows down with model dimension compared to first-order methods, creating a mismatch with practice.
- It derives the one-step learning dynamics of ZO SGD and shows that the empirical Neural Tangent Kernel (eNTK) emerges as the central quantity controlling learning behavior.
- The authors interpret elements of the ZO-produced eNTK as inner products of neural tangent vectors projected onto a random low-dimensional subspace.
- Using the Johnson–Lindenstrauss Lemma, they argue that how faithfully ZO eNTK is approximated depends mainly on the number of perturbations rather than the full parameter dimension.
- They conclude that the resulting dimension-free approximation error helps explain why ZO methods can scale to fine-tuning large language models despite theoretical concerns.
Related Articles

Top 10 Free AI Tools for Students in 2026: The Ultimate Study Guide
Dev.to

AI as Your Contingency Co-Pilot: Automating Wedding Day 'What-Ifs'
Dev.to

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss
MarkTechPost
When Claude Hallucinates in Court: The Latham & Watkins Incident and What It Means for Attorney Liability
MarkTechPost
Solidity LM surpasses Opus
Reddit r/LocalLLaMA