The Last Harness You'll Ever Build

arXiv cs.AI / 4/25/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that deploying AI agents in complex, domain-specific enterprise workflows still requires heavy, expert-driven “harness engineering” (prompts, tools, orchestration, and evaluation design).
  • It proposes a two-level framework: a Harness Evolution Loop that iteratively improves a harness using adversarial evaluation and history-based modifications.
  • It further introduces a Meta-Evolution Loop that learns an evolution protocol to rapidly converge harnesses on new, unseen tasks without additional human harness engineering.
  • The authors formalize the approach as meta-learning and provide algorithms for both loops, positioning the work as automation of both harness creation and the automation process itself.

Abstract

AI agents are increasingly deployed on complex, domain-specific workflows -- navigating enterprise web applications that require dozens of clicks and form fills, orchestrating multi-step research pipelines that span search, extraction, and synthesis, automating code review across unfamiliar repositories, and handling customer escalations that demand nuanced domain knowledge. \textbf{Each new task domain requires painstaking, expert-driven harness engineering}: designing the prompts, tools, orchestration logic, and evaluation criteria that make a foundation model effective. We present a two-level framework that automates this process. At the first level, the \textbf{Harness Evolution Loop} optimizes a worker agent's harness \mathcal{H} for a single task: a Worker Agent W_{\mathcal{H}} executes the task, an Evaluator Agent V adversarially diagnoses failures and scores performance, and an Evolution Agent E modifies the harness based on the full history of prior attempts. At the second level, the \textbf{Meta-Evolution Loop} optimizes the evolution protocol \Lambda = (W_{\mathcal{H}}, \mathcal{H}^{(0)}, V, E) itself across diverse tasks, \textbf{learning a protocol \Lambda^{(\text{best})} that enables rapid harness convergence on any new task -- so that adapting an agent to a novel domain requires no human harness engineering at all.} We formalize the correspondence to meta-learning and present both algorithms. The framework \textbf{shifts manual harness engineering into automated harness engineering}, and takes one step further -- \textbf{automating the design of the automation itself}.