Randomized Subspace Nesterov Accelerated Gradient

arXiv stat.ML / 5/4/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • Randomized-subspace optimization lowers the cost of first-order methods by operating on low-dimensional projected-gradient information, which is particularly useful for forward-mode automatic differentiation and communication-limited environments.
  • The paper develops randomized-subspace versions of Nesterov accelerated gradient for smooth convex and smooth strongly convex problems, assuming matrix smoothness and sketch moment conditions.
  • A central contribution is a tailored three-sequence formulation that matches classical Nesterov methods in the full-dimensional limit.
  • The authors provide accelerated oracle-complexity guarantees and explain explicitly how matrix smoothness assumptions and the sketch distribution affect the convergence rate.
  • The framework enables comparisons across different sketch families and identifies regimes where randomized-subspace acceleration can outperform full-dimensional Nesterov acceleration in oracle complexity.

Abstract

Randomized-subspace methods reduce the cost of first-order optimization by using only low-dimensional projected-gradient information, a feature that is attractive in forward-mode automatic differentiation and communication-limited settings. While Nesterov acceleration is well understood for full-gradient and coordinate-based methods, obtaining accelerated methods for general subspace sketches that use only projected-gradient information and can improve over full-dimensional Nesterov acceleration in oracle complexity is technically nontrivial. We develop randomized-subspace Nesterov accelerated gradient methods for smooth convex and smooth strongly convex optimization under matrix smoothness and generic sketch moment assumptions. The key technical ingredient is a three-sequence formulation tailored to matrix smoothness, which recovers the corresponding classical Nesterov methods in the full-dimensional case. The resulting theory establishes accelerated oracle-complexity guarantees and makes explicit how matrix smoothness and the sketch distribution enter the complexity. It also provides a unified basis for comparing sketch families and identifying when randomized-subspace acceleration improves over full-dimensional Nesterov acceleration in oracle complexity.