非凸最適化における有界でない分散下でのLower Boundsと近傍アンカー付きSGD

arXiv cs.LG / 2026/4/21

📰 ニュースDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

要点

  • 本論文は、勾配の分散が一様に有界でない状況での確率的・一次の非凸最適化を分析し、距離に応じて分散が二次的に増えてよいBlum-Gladyshev(BG-0)条件に焦点を当てる。
  • ε停留点を得るのに必要な情報理論的下界として、滑らかな関数ではΩ(ε^-6)、平均二乗滑らかさではΩ(ε^-4) 個の確率的BG-0オラクル照会が必要であることを示す。
  • これらは、古典的な有界分散SGDの複雑性(滑らかな場合Ω(ε^-4)、平均二乗滑らかな場合Ω(ε^-3))と比べて避けられない劣化が生じることを定量化している。
  • 下界に到達するために、HalpernアンカリングとTikhonov正則化を組み合わせたProximally Anchored Stochastic Approximation(PASTA)を提案し、BG-0オラクルが許す分散爆発を動的に抑える。
  • PASTAは、有界でない領域かつ有界でない確率的勾配の下でも、滑らか、平均二乗滑らか、弱凸、スター凸、Polyak–Łojasiewicz関数など複数の非凸レジームで最小最大(minimax)最適なオラクル複雑性を達成することを証明する。

Abstract

Analysis of Stochastic Gradient Descent (SGD) and its variants typically relies on the assumption of uniformly bounded variance, a condition that frequently fails in practical non-convex settings, such as neural network training, as well as in several elementary optimization settings. While several relaxations are explored in the literature, the Blum-Gladyshev (BG-0) condition, which permits the variance to grow quadratically with distance has recently been shown to be the weakest condition. However, the study of the oracle complexity of stochastic first-order non-convex optimization under BG-0 has remained underexplored. In this paper, we address this gap and establish information-theoretic lower bounds, proving that finding an \epsilon-stationary point requires \Omega(\epsilon^{-6}) stochastic BG-0 oracle queries for smooth functions and \Omega(\epsilon^{-4}) queries under mean-square smoothness. These limits demonstrate an unavoidable degradation from classical bounded-variance complexities, i.e., \Omega(\epsilon^{-4}) and \Omega(\epsilon^{-3}) for smooth and mean-square smooth cases, respectively. To match these lower bounds, we consider Proximally Anchored STochastic Approximation (PASTA), a unified algorithmic framework that couples Halpern anchoring with Tikhonov regularization to dynamically mitigate the extra variance explosion term permitted by the BG-0 oracle. We prove that PASTA achieves minimax optimal complexities across numerous non-convex regimes, including standard smooth, mean-square smooth, weakly convex, star-convex, and Polyak-Lojasiewicz functions, entirely under an unbounded domain and unbounded stochastic gradients.