非凸最適化における有界でない分散下でのLower Boundsと近傍アンカー付きSGD

arXiv cs.LG / 2026/4/21

📰 ニュースDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

要点

本論文は、勾配の分散が一様に有界でない状況での確率的・一次の非凸最適化を分析し、距離に応じて分散が二次的に増えてよいBlum-Gladyshev（BG-0）条件に焦点を当てる。
ε停留点を得るのに必要な情報理論的下界として、滑らかな関数ではΩ(ε^-6)、平均二乗滑らかさではΩ(ε^-4) 個の確率的BG-0オラクル照会が必要であることを示す。
これらは、古典的な有界分散SGDの複雑性（滑らかな場合Ω(ε^-4)、平均二乗滑らかな場合Ω(ε^-3)）と比べて避けられない劣化が生じることを定量化している。
下界に到達するために、HalpernアンカリングとTikhonov正則化を組み合わせたProximally Anchored Stochastic Approximation（PASTA）を提案し、BG-0オラクルが許す分散爆発を動的に抑える。
PASTAは、有界でない領域かつ有界でない確率的勾配の下でも、滑らか、平均二乗滑らか、弱凸、スター凸、Polyak–Łojasiewicz関数など複数の非凸レジームで最小最大（minimax）最適なオラクル複雑性を達成することを証明する。

Abstract

Analysis of Stochastic Gradient Descent (SGD) and its variants typically relies on the assumption of uniformly bounded variance, a condition that frequently fails in practical non-convex settings, such as neural network training, as well as in several elementary optimization settings. While several relaxations are explored in the literature, the Blum-Gladyshev (BG-0) condition, which permits the variance to grow quadratically with distance has recently been shown to be the weakest condition. However, the study of the oracle complexity of stochastic first-order non-convex optimization under BG-0 has remained underexplored. In this paper, we address this gap and establish information-theoretic lower bounds, proving that finding an

\epsilon

-stationary point requires

\Omega(\epsilon^{-6})

stochastic BG-0 oracle queries for smooth functions and

\Omega(\epsilon^{-4})

queries under mean-square smoothness. These limits demonstrate an unavoidable degradation from classical bounded-variance complexities, i.e.,

\Omega(\epsilon^{-4})

and

\Omega(\epsilon^{-3})

for smooth and mean-square smooth cases, respectively. To match these lower bounds, we consider Proximally Anchored STochastic Approximation (PASTA), a unified algorithmic framework that couples Halpern anchoring with Tikhonov regularization to dynamically mitigate the extra variance explosion term permitted by the BG-0 oracle. We prove that PASTA achieves minimax optimal complexities across numerous non-convex regimes, including standard smooth, mean-square smooth, weakly convex, star-convex, and Polyak-Lojasiewicz functions, entirely under an unbounded domain and unbounded stochastic gradients.

【特集】DSSver.2.0｜経産省・IPAが描くAX時代のDX人材17ロール

Innovatopia

“Mythos級”AI到来に備え、自民党が日本版「Project Glasswing」組成を検討

ITmedia AI+

名作の結末を”AI改変”、「マハーバーラタ」の“AI映像化”も──AI活用に野心燃やすインド映画界のいま

ITmedia AI+

複数のグラフを1つのAxesに表示しよう〜初心者向けMatplotlib講座 #6〜

Qiita

法務の審査時間を40%削減ーClaudeと「契約データベース」をつなぐと何が変わるのか

note

非凸最適化における有界でない分散下でのLower Boundsと近傍アンカー付きSGD

要点

Abstract

関連記事

【特集】DSSver.2.0｜経産省・IPAが描くAX時代のDX人材17ロール

“Mythos級”AI到来に備え、自民党が日本版「Project Glasswing」組成を検討

名作の結末を”AI改変”、「マハーバーラタ」の“AI映像化”も──AI活用に野心燃やすインド映画界のいま

複数のグラフを1つのAxesに表示しよう〜初心者向けMatplotlib講座 #6〜

法務の審査時間を40%削減ーClaudeと「契約データベース」をつなぐと何が変わるのか

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer