LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization

arXiv stat.ML / 4/10/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces the Prompt Duel Optimizer (PDO), a sample-efficient framework for label-free LLM prompt optimization that relies on pairwise preference feedback from an LLM judge rather than costly ground-truth labels.
  • PDO formulates prompt search as a dueling-bandit problem and uses Double Thompson Sampling to choose the most informative prompt comparisons within a fixed judge budget.
  • It also employs top-performer guided mutation to iteratively expand the candidate prompt set while pruning weaker prompts to improve efficiency.
  • Experiments on BIG-bench Hard (BBH) and MS MARCO indicate PDO finds better prompts than label-free baselines and achieves strong quality–cost trade-offs when comparison budgets are limited.

Abstract

Large language models (LLMs) are highly sensitive to prompts, but most automatic prompt optimization (APO) methods assume access to ground-truth references (e.g., labeled validation data) that are costly to obtain. We propose the Prompt Duel Optimizer (PDO), a sample-efficient framework for label-free prompt optimization based on pairwise preference feedback from an LLM judge. PDO casts prompt selection as a dueling-bandit problem and combines (i) Double Thompson Sampling to prioritize informative comparisons under a fixed judge budget, with (ii) top-performer guided mutation to expand the candidate pool while pruning weak prompts. Experiments on BIG-bench Hard (BBH) and MS MARCO show that PDO consistently identifies stronger prompts than label-free baselines, while offering favorable quality--cost trade-offs under constrained comparison budgets.