Adaptive Candidate Point Thompson Sampling for High-Dimensional Bayesian Optimization

arXiv cs.LG / 4/13/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses a key limitation of Thompson sampling in Bayesian optimization with Gaussian process surrogates: exact sampling over the maximizer is intractable and discretized candidate sets become exponentially sparse in high dimensions.
  • It proposes Adaptive Candidate Thompson Sampling (ACTS), which increases effective candidate density by adaptively shrinking the sampling search space instead of relying solely on denser discretizations or scalable GP approximations.
  • ACTS generates candidate points in lower-dimensional subspaces, using the gradient of a sampled surrogate function to guide where to search.
  • The method is presented as a simple drop-in replacement for existing Thompson sampling variants (including trust-region/local-approximation approaches) while yielding better maximizer samples and improved optimization results.
  • Experiments on both synthetic and real-world benchmarks indicate ACTS improves optimization performance over prior Thompson sampling strategies.

Abstract

In Bayesian optimization, Thompson sampling selects the evaluation point by sampling from the posterior distribution over the objective function maximizer. Because this sampling problem is intractable for Gaussian process (GP) surrogates, the posterior distribution is typically restricted to fixed discretizations (i.e., candidate points) that become exponentially sparse as dimensionality increases. While previous works aim to increase candidate point density through scalable GP approximations, our orthogonal approach increases density by adaptively reducing the search space during sampling. Specifically, we introduce Adaptive Candidate Thompson Sampling (ACTS), which generates candidate points in subspaces guided by the gradient of a surrogate model sample. ACTS is a simple drop-in replacement for existing TS methods -- including those that use trust regions or other local approximations -- producing better samples of maxima and improved optimization across synthetic and real-world benchmarks.