When Agents Evolve, Institutions Follow

arXiv cs.AI / 5/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that advanced multi-agent systems built on large language models face an organization and coordination challenge similar to that of historical political institutions.
  • It translates seven historical governance institutions across four governance patterns into executable multi-agent architectures and evaluates them under the same conditions.
  • Experiments across three LLMs and two benchmarks show that governance topology strongly affects collective performance.
  • Within the same model, performance differences between the best and worst institutional designs exceed 57 percentage points, and the best architecture varies systematically with model capability and task characteristics.
  • The findings suggest moving from merely “self-evolving agents” toward “self-evolving multi-agent systems,” where governance mechanisms can be reselected as tasks and capabilities change, with accompanying open-source code.

Abstract

Across millennia, complex societies have faced the same coordination problem of how to organize collective action among cognitively bounded and informationally incomplete individuals. Different civilizations developed different political institutions to answer the same basic questions of who proposes, who reviews, who executes, and how errors are corrected. We argue that multi-agent systems built on large language models face the same challenge. Their central problem is not only individual intelligence, but collective organization. Historical institutions therefore provide a structured design space for multi-agent architectures, making key trade-offs between efficiency and error correction, centralization and distribution, and specialization and redundancy empirically testable. We translate seven historical political institutions, spanning four canonical governance patterns, into executable multi-agent architectures and evaluate them under identical conditions across three large language models and two benchmarks. We find that governance topology strongly shapes collective performance. Within a single model, the gap between the best and worst institution exceeds 57 percentage points, while the optimal architecture shifts systematically with model capability and task characteristics. These results suggest that collective intelligence will not advance through a single optimal organizational form, but through governance mechanisms that can be reselected and reconfigured as tasks and capabilities evolve. More broadly, this points to a transition from \textbf{self-evolving agents} to the \textbf{self-evolving multi-agent system}. The code is available on \href{https://github.com/cf3i/SocialSystemArena}{GitHub}.