Greedy Is a Strong Default: Agents as Iterative Optimizers

arXiv cs.AI / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper replaces classical random candidate proposal steps with an LLM agent that uses evaluation diagnostics to propose better candidates in an iterative optimization loop.
  • Experiments across four discrete, mixed, and continuous optimization tasks show that greedy hill climbing with early stopping matches or outperforms more complex configurations while using substantially fewer evaluations.
  • A cross-task ablation finds that simulated annealing, parallel investigators, and using a second LLM model (OpenAI Codex) do not improve outcomes and instead increase evaluation cost by about 2–3×.
  • Results indicate the LLM’s learned prior is strong enough that sophisticated acceptance rules add limited value, with round 1 often accounting for most of the gains.
  • Beyond performance, the approach can yield interpretable outputs, such as cancer classification rules that reflect established cytopathology concepts.

Abstract

Classical optimization algorithms--hill climbing, simulated annealing, population-based methods--generate candidate solutions via random perturbations. We replace the random proposal generator with an LLM agent that reasons about evaluation diagnostics to propose informed candidates, and ask: does the classical optimization machinery still help when the proposer is no longer random? We evaluate on four tasks spanning discrete, mixed, and continuous search spaces (all replicated across 3 independent runs): rule-based classification on Breast Cancer (test accuracy 86.0% to 96.5%), mixed hyperparameter optimization for MobileNetV3-Small on STL-10 (84.5% to 85.8%, zero catastrophic failures vs. 60% for random search), LoRA fine-tuning of Qwen2.5-0.5B on SST-2 (89.5% to 92.7%, matching Optuna TPE with 2x efficiency), and XGBoost on Adult Census (AUC 0.9297 to 0.9317, tying CMA-ES with 3x fewer evaluations). Empirically, on these tasks: a cross-task ablation shows that simulated annealing, parallel investigators, and even a second LLM model (OpenAI Codex) provide no benefit over greedy hill climbing while requiring 2-3x more evaluations. In our setting, the LLM's learned prior appears strong enough that acceptance-rule sophistication has limited impact--round 1 alone delivers the majority of improvement, and variants converge to similar configurations across strategies. The practical implication is surprising simplicity: greedy hill climbing with early stopping is a strong default. Beyond accuracy, the framework produces human-interpretable artifacts--the discovered cancer classification rules independently recapitulate established cytopathology principles.