Effective Strategies for Asynchronous Software Engineering Agents

arXiv cs.CL / 3/24/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses why AI SWE agents struggle with long-horizon, multi-step tasks and proposes asynchronous multi-agent collaboration as a way to improve timeliness and throughput.
  • It introduces CAID (Centralized Asynchronous Isolated Delegation), a coordination paradigm that uses centralized, dependency-aware task delegation plus isolated workspaces to reduce interfering concurrent edits.
  • CAID consolidates partial agent progress through structured integration backed by executable, test-based verification, targeting both correctness and completion reliability.
  • In evaluations, CAID improves accuracy over single-agent baselines by 26.7% on PaperBench reproduction tasks and by 14.3% on Commit0 Python library development tasks.
  • The authors conclude that branch-and-merge is a key coordination mechanism for multi-agent SWE, and that git worktree/commit/merge can implement it reliably in an executable workflow.

Abstract

AI agents have become increasingly capable at isolated software engineering (SWE) tasks such as resolving issues on Github. Yet long-horizon tasks involving multiple interdependent subtasks still pose challenges both with respect to accuracy, and with respect to timely completion. A natural approach to solving these long-horizon tasks in a timely manner is asynchronous multi-agent collaboration, where multiple agents work on different parts of the task at the same time. But effective application of multi-agent systems has proven surprisingly difficult: concurrent edits by multiple agents interfere with each other, dependencies are difficult to synchronize, and combining partial progress into a coherent whole is challenging. On the other hand, human developers have long relied on mature collaboration infrastructure to manage these challenges in large software projects. Inspired by these collaboration primitives, we introduce Centralized Asynchronous Isolated Delegation (CAID), a structured multi-agent coordination paradigm grounded in three core SWE primitives: centralized task delegation, asynchronous execution, and isolated workspaces. CAID constructs dependency-aware task plans through a central manager, executes subtasks concurrently in isolated workspaces, and consolidates progress via structured integration with executable test-based verification. In empirical evaluation, we find that CAID improves accuracy over single-agent baselines by 26.7% absolute on paper reproduction tasks (PaperBench) and 14.3% on Python library development tasks (Commit0). Through systematic analysis, we find that branch-and-merge is a central coordination mechanism for multi-agent collaboration, and that SWE primitives such as git worktree, git commit, and git merge enable it to be realized in a reliable and executable manner.