ABSTRAL: Automatic Design of Multi-Agent Systems Through Iterative Refinement and Topology Optimization

arXiv cs.AI / 3/25/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • ABSTRAL is a research framework for automatically designing multi-agent system (MAS) architectures by treating the architecture as an evolving natural-language document that is iteratively refined via contrastive trace analysis.
  • The study quantifies a “multi-agent coordination tax,” reporting that under fixed turn budgets ensembles achieve only 26% turn efficiency and 66% of tasks hit the turn limit, though they still outperform single-agent baselines by finding more parallelizable decompositions.
  • ABSTRAL encodes design knowledge in inspectable documents, showing transfer gains where learned topology reasoning and role templates from one domain reduce cold-start effort on new domains (transferred seeds match cold-start iteration 3 performance in a single iteration).
  • Contrastive trace analysis can discover specialist roles that were not present in any initial design, a capability the authors claim prior systems did not demonstrate.
  • On SOPBench (134 bank tasks) using a GPT-4o backbone, ABSTRAL achieves 70% validation and 65.96% test pass rates, and the converged documents are released for inspection as design rationale.

Abstract

How should multi-agent systems be designed, and can that design knowledge be captured in a form that is inspectable, revisable, and transferable? We introduce ABSTRAL, a framework that treats MAS architecture as an evolving natural-language document, an artifact refined through contrastive trace analysis. Three findings emerge. First, we provide a precise measurement of the multi-agent coordination tax: under fixed turn budgets, ensembles achieve only 26% turn efficiency, with 66% of tasks exhausting the limit, yet still improve over single-agent baselines by discovering parallelizable task decompositions. Second, design knowledge encoded in documents transfers: topology reasoning and role templates learned on one domain provide a head start on new domains, with transferred seeds matching coldstart iteration 3 performance in a single iteration. Third, contrastive trace analysis discovers specialist roles absent from any initial design, a capability no prior system demonstrates. On SOPBench (134 bank tasks, deterministic oracle), ABSTRAL reaches 70% validation / 65.96% test pass rate with a GPT-4o backbone. We release the converged documents as inspectable design rationale.