Taking the GP Out of the Loop

arXiv stat.ML / 5/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • Bayesian optimization is increasingly being applied to settings with cheaper evaluations and much larger numbers of observations, but Gaussian-process (GP) surrogates become a major computational bottleneck due to costly hyperparameter fitting.
  • The paper proposes Epistemic Nearest Neighbors (ENN), a lightweight surrogate that estimates both function values and uncertainty using K-nearest-neighbor observations, achieving O(N) scaling for fitting and acquisition.
  • It introduces TuRBO-ENN, which replaces the GP surrogate in TuRBO with ENN and uses UCB (μ(x)+σ(x)) instead of Thompson sampling for acquisition.
  • For noise-free problems, the method can even skip hyperparameter fitting by using a non-dominated sorting approach over predicted mean and uncertainty.
  • Experiments show that TuRBO-ENN can cut total proposal time (fitting plus acquisition) by 1–2 orders of magnitude versus TuRBO with up to 50,000 observations.

Abstract

Bayesian optimization (BO) has traditionally solved black-box problems where function evaluation is expensive and, therefore, observations are few. Recently, however, there has been growing interest in applying BO to problems where function evaluation is cheaper and observations are more plentiful. In this regime, scaling to many observations N is impeded by Gaussian-process (GP) surrogates: GP hyperparameter fitting scales as \mathcal{O}(N^3) (reduced to roughly \mathcal{O}(N^2) in modern implementations), and it is repeated at every BO iteration. Many methods improve scaling at acquisition time, but hyperparameter fitting still scales poorly, making it the bottleneck. We propose Epistemic Nearest Neighbors (ENN), a lightweight alternative to GPs that estimates function values and uncertainty (epistemic and aleatoric) from K-nearest-neighbor observations. ENN scales as \mathcal{O}(N) for both fitting and acquisition. Our BO method, TuRBO-ENN, replaces the GP surrogate in TuRBO with ENN and its Thompson-sampling acquisition with \mathrm{UCB} = \mu(x) + \sigma(x). For the special case of noise-free problems, we can omit fitting altogether by replacing \mathrm{UCB} with a non-dominated sort over \mu(x) and \sigma(x). We show empirically that TuRBO-ENN reduces proposal time (i.e., fitting time + acquisition time) by one to two orders of magnitude compared to TuRBO at up to 50,000 observations.

Taking the GP Out of the Loop | AI Navigate