Model Evolution Under Zeroth-Order Optimization: A Neural Tangent Kernel Perspective
arXiv cs.LG / 2026/3/24
💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research
要点
- The paper studies zeroth-order (ZO) optimization for neural networks, where gradients are estimated using only forward passes and backpropagation is avoided to save memory.
- It introduces the Neural Zeroth-order Kernel (NZK) to characterize how neural models evolve in function space under ZO updates, addressing the difficulty caused by noisy stochastic gradient estimates.
- For linear models, the authors prove that the expected NZK is invariant during training and derive a closed-form model evolution under squared loss based on moments of the random perturbation directions.
- The analysis extends to linearized neural networks, interpreting ZO updates as a form of kernel gradient descent under the NZK framework.
- Experiments on MNIST, CIFAR-10, and Tiny ImageNet support the theory and show convergence acceleration when using a single shared random vector.
