Generalization Guarantees on Data-Driven Tuning of Gradient Descent with Langevin Updates

arXiv cs.LG / 4/16/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces the Langevin Gradient Descent Algorithm (LGD), a meta-learning approach that tunes hyperparameters by approximating the posterior mean implied by a convex regression loss and regularizer.
  • It proves the existence of an optimal hyperparameter configuration under which LGD attains the Bayes-optimal predictor for squared loss in the studied convex regression setting.
  • The authors derive data-driven generalization guarantees for the meta-learning process that selects LGD hyperparameters from a set of tasks, using a pseudo-dimension bound that scales as O(dh) (up to logarithmic factors).
  • The work extends prior hyperparameter generalization results from elastic net (limited to h=2 hyperparameters) to a broader class of convex regression problems with larger hyperparameter spaces.
  • The paper includes preliminary empirical evidence that both LGD and the associated meta-learning procedure work in few-shot linear regression using synthetically generated datasets.

Abstract

We study learning to learn for regression problems through the lens of hyperparameter tuning. We propose the Langevin Gradient Descent Algorithm (LGD), which approximates the mean of the posterior distribution defined by the loss function and regularizer of a convex regression task. We prove the existence of an optimal hyperparameter configuration for which the LGD algorithm achieves the Bayes' optimal solution for squared loss. Subsequently, we study generalization guarantees on meta-learning optimal hyperparameters for the LGD algorithm from a given set of tasks in the data-driven setting. For a number of parameters d and hyperparameter dimension h, we show a pseudo-dimension bound of O(dh), upto logarithmic terms under mild assumptions on LGD. This matches the dimensional dependence of the bounds obtained in prior work for the elastic net, which only allows for h=2 hyperparameters, and extends their bounds to regression on convex loss. Finally, we show empirical evidence of the success of LGD and the meta-learning procedure for few-shot learning on linear regression using a few synthetically created datasets.