Greedy Is a Strong Default: Agents as Iterative Optimizers

arXiv cs.AI / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper replaces classical random candidate proposal steps with an LLM agent that uses evaluation diagnostics to propose better candidates in an iterative optimization loop.
Experiments across four discrete, mixed, and continuous optimization tasks show that greedy hill climbing with early stopping matches or outperforms more complex configurations while using substantially fewer evaluations.
A cross-task ablation finds that simulated annealing, parallel investigators, and using a second LLM model (OpenAI Codex) do not improve outcomes and instead increase evaluation cost by about 2–3×.
Results indicate the LLM’s learned prior is strong enough that sophisticated acceptance rules add limited value, with round 1 often accounting for most of the gains.
Beyond performance, the approach can yield interpretable outputs, such as cancer classification rules that reflect established cytopathology concepts.

Abstract

Classical optimization algorithms--hill climbing, simulated annealing, population-based methods--generate candidate solutions via random perturbations. We replace the random proposal generator with an LLM agent that reasons about evaluation diagnostics to propose informed candidates, and ask: does the classical optimization machinery still help when the proposer is no longer random? We evaluate on four tasks spanning discrete, mixed, and continuous search spaces (all replicated across 3 independent runs): rule-based classification on Breast Cancer (test accuracy 86.0% to 96.5%), mixed hyperparameter optimization for MobileNetV3-Small on STL-10 (84.5% to 85.8%, zero catastrophic failures vs. 60% for random search), LoRA fine-tuning of Qwen2.5-0.5B on SST-2 (89.5% to 92.7%, matching Optuna TPE with 2x efficiency), and XGBoost on Adult Census (AUC 0.9297 to 0.9317, tying CMA-ES with 3x fewer evaluations). Empirically, on these tasks: a cross-task ablation shows that simulated annealing, parallel investigators, and even a second LLM model (OpenAI Codex) provide no benefit over greedy hill climbing while requiring 2-3x more evaluations. In our setting, the LLM's learned prior appears strong enough that acceptance-rule sophistication has limited impact--round 1 alone delivers the majority of improvement, and variants converge to similar configurations across strategies. The practical implication is surprising simplicity: greedy hill climbing with early stopping is a strong default. Beyond accuracy, the framework produces human-interpretable artifacts--the discovered cancer classification rules independently recapitulate established cytopathology principles.