Multi-Agent LLMs for Adaptive Acquisition in Bayesian Optimization

arXiv cs.AI / 4/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies how LLM-based black-box optimization implicitly manages the exploration–exploitation trade-off, contrasting it with Bayesian Optimization where this balance is explicitly encoded in acquisition functions.
  • It analyzes how different operational definitions of exploration (informativeness, diversity, and representativeness) influence LLM-mediated search policy learning and the resulting search dynamics.
  • The authors find that single-agent, prompt-based approaches that combine strategy selection and candidate generation often experience cognitive overload, producing unstable behavior and premature convergence.
  • To improve control and stability, they introduce a multi-agent framework that separates strategic policy mediation (assigning interpretable weights to exploration criteria) from tactical candidate generation (producing candidates conditioned on those weights).
  • Experiments on multiple continuous optimization benchmarks show that decomposing strategic control from candidate generation significantly improves the effectiveness of LLM-mediated search.

Abstract

The exploration-exploitation trade-off is central to sequential decision-making and black-box optimization, yet how Large Language Models (LLMs) reason about and manage this trade-off remains poorly understood. Unlike Bayesian Optimization, where exploration and exploitation are explicitly encoded through acquisition functions, LLM-based optimization relies on implicit, prompt-based reasoning over historical evaluations, making search behavior difficult to analyze or control. In this work, we present a metric-level study of LLM-mediated search policy learning, studying how LLMs construct and adapt exploration-exploitation strategies under multiple operational definitions of exploration, including informativeness, diversity, and representativeness. We show that single-agent LLM approaches, which jointly perform strategy selection and candidate generation within a single prompt, suffer from cognitive overload, leading to unstable search dynamics and premature convergence. To address this limitation, we propose a multi-agent framework that decomposes exploration-exploitation control into strategic policy mediation and tactical candidate generation. A strategy agent assigns interpretable weights to multiple search criteria, while a generation agent produces candidates conditioned on the resulting search policy defined as weights. This decomposition renders exploration-exploitation decisions explicit, observable, and adjustable. Empirical results across various continuous optimization benchmarks indicate that separating strategic control from candidate generation substantially improves the effectiveness of LLM-mediated search.