Lower Bounds and Proximally Anchored SGD for Non-Convex Minimization Under Unbounded Variance

arXiv cs.LG / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper analyzes stochastic first-order non-convex optimization when gradient variance is not uniformly bounded, focusing on the Blum-Gladyshev (BG-0) condition that allows variance to grow quadratically with distance.
It derives information-theoretic lower bounds showing that obtaining an ε-stationary point requires Ω(ε^-6) stochastic BG-0 oracle queries for smooth functions and Ω(ε^-4) queries under mean-square smoothness.
The results imply an unavoidable degradation versus classical bounded-variance SGD complexities, quantifying how much performance must worsen under BG-0.
To reach these limits, the authors propose Proximally Anchored Stochastic Approximation (PASTA), combining Halpern anchoring with Tikhonov regularization to control the variance blow-up allowed by BG-0.
They prove PASTA achieves minimax-optimal oracle complexities across multiple non-convex regimes (including smooth, mean-square smooth, weakly convex, star-convex, and Polyak–Łojasiewicz) even on unbounded domains with unbounded stochastic gradients.

Abstract

Analysis of Stochastic Gradient Descent (SGD) and its variants typically relies on the assumption of uniformly bounded variance, a condition that frequently fails in practical non-convex settings, such as neural network training, as well as in several elementary optimization settings. While several relaxations are explored in the literature, the Blum-Gladyshev (BG-0) condition, which permits the variance to grow quadratically with distance has recently been shown to be the weakest condition. However, the study of the oracle complexity of stochastic first-order non-convex optimization under BG-0 has remained underexplored. In this paper, we address this gap and establish information-theoretic lower bounds, proving that finding an

\epsilon

-stationary point requires

\Omega(\epsilon^{-6})

stochastic BG-0 oracle queries for smooth functions and

\Omega(\epsilon^{-4})

queries under mean-square smoothness. These limits demonstrate an unavoidable degradation from classical bounded-variance complexities, i.e.,

\Omega(\epsilon^{-4})

and

\Omega(\epsilon^{-3})

for smooth and mean-square smooth cases, respectively. To match these lower bounds, we consider Proximally Anchored STochastic Approximation (PASTA), a unified algorithmic framework that couples Halpern anchoring with Tikhonov regularization to dynamically mitigate the extra variance explosion term permitted by the BG-0 oracle. We prove that PASTA achieves minimax optimal complexities across numerous non-convex regimes, including standard smooth, mean-square smooth, weakly convex, star-convex, and Polyak-Lojasiewicz functions, entirely under an unbounded domain and unbounded stochastic gradients.

A practical guide to getting comfortable with AI coding tools

Dev.to

Every time a new model comes out, the old one is obsolete of course

Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

Dev.to

🚀 Major BrowserAct CLI Update

Dev.to

Lower Bounds and Proximally Anchored SGD for Non-Convex Minimization Under Unbounded Variance

Key Points

Abstract

Related Articles

A practical guide to getting comfortable with AI coding tools

Every time a new model comes out, the old one is obsolete of course

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

🚀 Major BrowserAct CLI Update

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer