Can AI Models Direct Each Other? Organizational Structure as a Probe into Training Limitations
arXiv cs.AI / 3/30/2026
📰 News
Key Points
- The paper tests whether an expensive “manager” LLM can direct a cheaper “worker” LLM to solve software engineering tasks using a two-agent ManagerWorker pipeline with external task dispatch and code execution.
- Across 200 SWE-bench Lite instances, a strong manager guiding a weak worker reaches 62% accuracy, comparable to a strong single model at 60% accuracy while using far fewer “strong-model” tokens.
- A weak manager directing a weak worker underperforms the weak baseline (42% vs. 44%), indicating that the manager-worker setup only helps when there is a real capability gap and effective direction.
- The authors find that value comes from active delegation/structured exploration rather than review-only loops (only +2 percentage points), with planning/exploration adding about +11 points.
- The results suggest a training limitation: current models are largely trained as monolithic agents, so splitting roles into director/worker fights the training distribution; the proposed fix is to keep each agent near its trained mode and externalize organizational structure in code.
- categories: [