Multi-Agent LLMs for Adaptive Acquisition in Bayesian Optimization

arXiv cs.AI / 4/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies how LLM-based black-box optimization implicitly manages the exploration–exploitation trade-off, contrasting it with Bayesian Optimization where this balance is explicitly encoded in acquisition functions.
It analyzes how different operational definitions of exploration (informativeness, diversity, and representativeness) influence LLM-mediated search policy learning and the resulting search dynamics.
The authors find that single-agent, prompt-based approaches that combine strategy selection and candidate generation often experience cognitive overload, producing unstable behavior and premature convergence.
To improve control and stability, they introduce a multi-agent framework that separates strategic policy mediation (assigning interpretable weights to exploration criteria) from tactical candidate generation (producing candidates conditioned on those weights).
Experiments on multiple continuous optimization benchmarks show that decomposing strategic control from candidate generation significantly improves the effectiveness of LLM-mediated search.

Abstract

The exploration-exploitation trade-off is central to sequential decision-making and black-box optimization, yet how Large Language Models (LLMs) reason about and manage this trade-off remains poorly understood. Unlike Bayesian Optimization, where exploration and exploitation are explicitly encoded through acquisition functions, LLM-based optimization relies on implicit, prompt-based reasoning over historical evaluations, making search behavior difficult to analyze or control. In this work, we present a metric-level study of LLM-mediated search policy learning, studying how LLMs construct and adapt exploration-exploitation strategies under multiple operational definitions of exploration, including informativeness, diversity, and representativeness. We show that single-agent LLM approaches, which jointly perform strategy selection and candidate generation within a single prompt, suffer from cognitive overload, leading to unstable search dynamics and premature convergence. To address this limitation, we propose a multi-agent framework that decomposes exploration-exploitation control into strategic policy mediation and tactical candidate generation. A strategy agent assigns interpretable weights to multiple search criteria, while a generation agent produces candidates conditioned on the resulting search policy defined as weights. This decomposition renders exploration-exploitation decisions explicit, observable, and adjustable. Empirical results across various continuous optimization benchmarks indicate that separating strategic control from candidate generation substantially improves the effectiveness of LLM-mediated search.