Learning Operators by Regularized Stochastic Gradient Descent with Operator-valued Kernels

arXiv stat.ML / 4/28/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies statistical inverse problems where the goal is to learn an operator mapping from a Polish space to a separable Hilbert space, with targets living in a vector-valued RKHS defined via an operator-valued kernel.
  • It analyzes regularized stochastic gradient descent (SGD) in both online and finite-horizon regimes, using polynomially decaying learning rates/regularization for online updates and fixed hyperparameters for finite-horizon training.
  • Under structural and distributional assumptions, the authors prove dimension-independent bounds on prediction and estimation errors, yielding near-optimal convergence rates in expectation.
  • The work also derives high-probability error estimates and shows they lead to almost sure convergence, while introducing a general method for obtaining high-probability guarantees in infinite-dimensional settings.
  • Practical applicability is demonstrated through applications to structured prediction and parametric PDEs, showing how the theoretical framework can be implemented in real problem settings.

Abstract

We consider a class of statistical inverse problems involving the estimation of a regression operator from a Polish space to a separable Hilbert space, where the target lies in a vector-valued reproducing kernel Hilbert space induced by an operator-valued kernel. To address the associated ill-posedness, we analyze regularized stochastic gradient descent (SGD) algorithms in both online and finite-horizon settings. The former uses polynomially decaying step sizes and regularization parameters, while the latter adopts fixed values. Under suitable structural and distributional assumptions, we establish dimension-independent bounds for prediction and estimation errors. The resulting convergence rates are near-optimal in expectation, and we also derive high-probability estimates that imply almost sure convergence. Our analysis introduces a general technique for obtaining high-probability guarantees in infinite-dimensional settings. We illustrate the practical scope of our framework with applications to structured prediction and parametric PDEs, providing examples that reflect how the approach can be applied in practice.

Learning Operators by Regularized Stochastic Gradient Descent with Operator-valued Kernels | AI Navigate