Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses
arXiv cs.CL / 4/29/2026
📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes Agentic Harness Engineering (AHE) to automate the evolution of coding-agent “harnesses,” which strongly influence how models run tasks against repositories and tools.
- AHE adds matched observability to three stages—component editing, trajectory inspection, and decision making—by making the action space explicit (component observability), building a drill-down evidence corpus from long trajectories (experience observability), and linking each edit to a prediction later validated by task outcomes (decision observability).
- By turning each harness edit into a falsifiable contract, AHE aims to avoid naive trial-and-error during harness optimization.
- Experiments show that after ten AHE iterations, pass@1 on Terminal-Bench 2 improves from 69.7% to 77.0%, beating a human-designed harness (Codex-CLI) and strong self-evolving baselines.
- The evolved (then frozen) harness transfers to other settings, improving token efficiency on SWE-bench-verified and delivering cross-family gains on Terminal-Bench 2, suggesting the learned components generalize beyond specific benchmarks.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

How I Use AI Agents to Maintain a Living Knowledge Base for My Team
Dev.to

An API testing tool built specifically for AI agent loops
Dev.to
IK_LLAMA now supports Qwen3.5 MTP Support :O
Reddit r/LocalLLaMA
OpenAI models, Codex, and Managed Agents come to AWS
Dev.to

Indian Developers: How to Build AI Side Income with $0 Capital in 2026
Dev.to