Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent

arXiv stat.ML / 2026/3/24

💬 オピニオンIdeas & Deep AnalysisModels & Research

共有:

要点

The paper studies how to make stochastic gradient descent (SGD) adaptive to both stochastic gradient noise (unknown variance) and problem-dependent constants when optimizing smooth, strongly-convex objectives.
It proves that SGD with exponentially decreasing step sizes can reach an oto-efficient rate \tilde{O}(\exp(-T/\kappa)+\sigma^2/T) without requiring prior knowledge of the noise variance \(\sigma^2\).
To adapt to unknown smoothness, the authors analyze stochastic line-search (SLS), showing it can match the target rate only up to a neighborhood of the optimum, while offline smoothness estimates recover convergence to the exact minimizer but with slower rates proportional to estimation error.
The work further introduces an accelerated variant (ASGD) combining Nesterov acceleration with exponential step sizes, achieving a near-optimal \tilde{O}(\exp(-T/\sqrt{\kappa})+\sigma^2/T) rate without knowing \(\sigma^2\), and shows offline parameter estimates still ensure convergence though at reduced speed.
Empirical results support that exponential step sizes paired with a new SLS variant improve performance, aligning with the theoretical behaviors.

Abstract

We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise

\sigma^2

in the stochastic gradients and (ii) problem-dependent constants. When minimizing smooth, strongly-convex functions with condition number

\kappa

, we prove that

T

iterations of SGD with exponentially decreasing step-sizes and knowledge of the smoothness can achieve an

\tilde{O} \left(\exp \left( \frac{-T}{\kappa} \right) + \frac{\sigma^2}{T} \right)

rate, without knowing

\sigma^2

. In order to be adaptive to the smoothness, we use a stochastic line-search (SLS) and show (via upper and lower-bounds) that SGD with SLS converges at the desired rate, but only to a neighbourhood of the solution. On the other hand, we prove that SGD with an offline estimate of the smoothness converges to the minimizer. However, its rate is slowed down proportional to the estimation error. Next, we prove that SGD with Nesterov acceleration and exponential step-sizes (referred to as ASGD) can achieve the near-optimal

\tilde{O} \left(\exp \left( \frac{-T}{\sqrt{\kappa}} \right) + \frac{\sigma^2}{T} \right)

rate, without knowledge of

\sigma^2

. When used with offline estimates of the smoothness and strong-convexity, ASGD still converges to the solution, albeit at a slower rate. We empirically demonstrate the effectiveness of exponential step-sizes coupled with a novel variant of SLS.

生成AIが「下手な鉄砲」型サイバー攻撃を増やす、足元固めを急ごう

日経XTECH

文書の内容を学習なしでLLMに反映、Sakana AIの新技術 RAG代替は可能か

日経XTECH

NEC、「暗黙知」をAIで可視化—危険の予兆を映像から検出し、改善アドバイスを自動生成する技術を世界初開発

Innovatopia

LLMが数学の未解決問題を解いた日 — Epoch.ai FrontierMathと、人間とAIの協働が開く新しい研究スタイル

Qiita

AI生成で児童性的虐待をリアルに描写した画像・動画は前年比14％増の8029件確認されたという報告、特に動画件数は1年で260倍以上も増加

GIGAZINE

Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent

要点

Abstract

関連記事

生成AIが「下手な鉄砲」型サイバー攻撃を増やす、足元固めを急ごう

文書の内容を学習なしでLLMに反映、Sakana AIの新技術 RAG代替は可能か

NEC、「暗黙知」をAIで可視化—危険の予兆を映像から検出し、改善アドバイスを自動生成する技術を世界初開発

LLMが数学の未解決問題を解いた日 — Epoch.ai FrontierMathと、人間とAIの協働が開く新しい研究スタイル

AI生成で児童性的虐待をリアルに描写した画像・動画は前年比14％増の8029件確認されたという報告、特に動画件数は1年で260倍以上も増加

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer