Drop the Hierarchy and Roles: How Self-Organizing LLM Agents Outperform Designed Structures

arXiv cs.AI / 4/1/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper reports a large-scale 25,000-task experiment comparing multi-agent LLM coordination protocols, ranging from fixed hierarchical role structures to emergent self-organization across 8 models and 4–256 agents.
  • It finds that even with minimal scaffolding, agents spontaneously invent specialized roles, abstain from out-of-competence tasks, and form only shallow hierarchies without pre-assigned roles or external role design.
  • A hybrid Sequential protocol enables higher autonomy and outperforms centralized coordination by 14%, with substantial quality differences across protocols (44% spread; Cohen’s d=1.86).
  • Emergent autonomy depends on model capability: stronger models self-organize well, while weaker models require more rigid structure, implying that improving foundation models could broaden where autonomous coordination works.
  • The authors show sub-linear scaling up to 256 agents without quality degradation, discover thousands of unique roles, and find that open-source models achieve 95% of closed-source quality at 24× lower cost.

Abstract

How much autonomy can multi-agent LLM systems sustain -- and what enables it? We present a 25,000-task computational experiment spanning 8 models, 4--256 agents, and 8 coordination protocols ranging from externally imposed hierarchy to emergent self-organization. We observe that autonomous behavior already emerges in current LLM agents: given minimal structural scaffolding (fixed ordering), agents spontaneously invent specialized roles, voluntarily abstain from tasks outside their competence, and form shallow hierarchies -- without any pre-assigned roles or external design. A hybrid protocol (Sequential) that enables this autonomy outperforms centralized coordination by 14% (p<0.001), with a 44% quality spread between protocols (Cohen's d=1.86, p<0.0001). The degree of emergent autonomy scales with model capability: strong models self-organize effectively, while models below a capability threshold still benefit from rigid structure -- suggesting that as foundation models improve, the scope for autonomous coordination will expand. The system scales sub-linearly to 256 agents without quality degradation (p=0.61), producing 5,006 unique roles from just 8 agents. Results replicate across closed- and open-source models, with open-source achieving 95% of closed-source quality at 24x lower cost. The practical implication: give agents a mission, a protocol, and a capable model -- not a pre-assigned role.