Instance-optimal stochastic convex optimization: Can we improve upon sample-average and robust stochastic approximation?

arXiv stat.ML / 3/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper analyzes unconstrained stochastic convex optimization under a stochastic oracle that adds both additive and multiplicative noise, framing it as a canonical problem across operations research, signal processing, and machine learning.
  • It shows that common methods like sample-average approximation and robust/averaged stochastic approximation can yield suboptimal—and sometimes arbitrarily poor—finite-sample performance.
  • The authors introduce VISOR, a variance-reduction strategy that attains substantially better performance than those baselines while using the same sample size.
  • Theoretical results include finite-sample, information-theoretic local minimax lower bounds and an instance-dependent characterization of what limits any estimator’s performance.
  • An accelerated VISOR variant is shown to be instance-optimal (up to logarithmic factors), and applications to generalized linear models—especially linear regression—deliver improved non-asymptotic, instance-dependent generalization error bounds for stochastic methods.

Abstract

We study the unconstrained minimization of a smooth and strongly convex population loss function under a stochastic oracle that introduces both additive and multiplicative noise; this is a canonical and widely-studied setting that arises across operations research, signal processing, and machine learning. We begin by showing that standard approaches such as sample average approximation and robust (or averaged) stochastic approximation can lead to suboptimal -- and in some cases arbitrarily poor -- performance with realistic finite sample sizes. In contrast, we demonstrate that a carefully designed variance reduction strategy, which we term VISOR for short, can significantly outperform these approaches while using the same sample size. Our upper bounds are complemented by finite-sample, information-theoretic local minimax lower bounds, which highlight fundamental, instance-dependent factors that govern the performance of any estimator. Taken together, these results demonstrate that an accelerated variant of VISOR is instance-optimal, achieving the best possible sample complexity up to logarithmic factors while also attaining optimal oracle complexity. We apply our theory to generalized linear models and improve upon classical results. In particular, we obtain the best-known non-asymptotic, instance-dependent generalization error bounds for stochastic methods, even in linear regression.