Meta-Harness: End-to-End Optimization of Model Harnesses
arXiv cs.AI / 3/31/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that LLM performance depends not only on model weights but also on the “harness” code that controls what context is stored, retrieved, and presented to the model.
- It introduces Meta-Harness, an outer-loop agentic system that searches over harness code by using a proposer to access source code and evaluate candidates via scoring and execution traces recorded on the filesystem.
- On online text classification, Meta-Harness improves accuracy over a state-of-the-art context management approach by 7.7 points while reducing context token usage by 4x.
- For retrieval-augmented math reasoning, a single automatically discovered harness boosts accuracy by 4.7 points on average across five held-out models over 200 IMO-level problems.
- In agentic coding tasks, the discovered harnesses outperform the best hand-engineered baselines on TerminalBench-2, suggesting automated harness engineering can materially improve real applications.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles
Why AI agent teams are just hoping their agents behave
Dev.to

Harness as Code: Treating AI Workflows Like Infrastructure
Dev.to

How to Make Claude Code Better at One-Shotting Implementations
Towards Data Science

The Crypto AI Agent Stack That Costs $0/Month to Run
Dev.to

Bag of Freebies for Training Object Detection Neural Networks
Dev.to