Learning Operators by Regularized Stochastic Gradient Descent with Operator-valued Kernels

arXiv stat.ML / 4/28/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies statistical inverse problems where the goal is to learn an operator mapping from a Polish space to a separable Hilbert space, with targets living in a vector-valued RKHS defined via an operator-valued kernel.
It analyzes regularized stochastic gradient descent (SGD) in both online and finite-horizon regimes, using polynomially decaying learning rates/regularization for online updates and fixed hyperparameters for finite-horizon training.
Under structural and distributional assumptions, the authors prove dimension-independent bounds on prediction and estimation errors, yielding near-optimal convergence rates in expectation.
The work also derives high-probability error estimates and shows they lead to almost sure convergence, while introducing a general method for obtaining high-probability guarantees in infinite-dimensional settings.
Practical applicability is demonstrated through applications to structured prediction and parametric PDEs, showing how the theoretical framework can be implemented in real problem settings.

Abstract

We consider a class of statistical inverse problems involving the estimation of a regression operator from a Polish space to a separable Hilbert space, where the target lies in a vector-valued reproducing kernel Hilbert space induced by an operator-valued kernel. To address the associated ill-posedness, we analyze regularized stochastic gradient descent (SGD) algorithms in both online and finite-horizon settings. The former uses polynomially decaying step sizes and regularization parameters, while the latter adopts fixed values. Under suitable structural and distributional assumptions, we establish dimension-independent bounds for prediction and estimation errors. The resulting convergence rates are near-optimal in expectation, and we also derive high-probability estimates that imply almost sure convergence. Our analysis introduces a general technique for obtaining high-probability guarantees in infinite-dimensional settings. We illustrate the practical scope of our framework with applications to structured prediction and parametric PDEs, providing examples that reflect how the approach can be applied in practice.