someone opensource an ai agent that autonomously upgraded itself to #1 across multiple domains in < 24 hours…. then open sourced the entire thing
but here’s why it actually works:
- agents fucking suck, not because of the model, because of their harness (tools, system prompts etc)
- Auto agent creates a Meta agent that tweaks your agents harness, runs tests, improves it again - until it’s #1 at its goal
- best part: you can set this up for ANY task. in this article he uses it for terminal bench (code) and spreadsheets (financial modelling) - it topped rankings for both :)
- secret sauce: he used THE SAME MODEL to evaluate the agent - claude managing claude = better understanding of why it failed and how to improve it
humans were the fucking bottleneck and this not only saves you a load of time, it’s just a better way to train them for domain specific tasks
[link] [comments]




