Beyond Arrow's Impossibility: Fairness as an Emergent Property of Multi-Agent Collaboration

arXiv cs.CL / 4/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that fairness in language-model settings may emerge from multi-agent interaction rather than being guaranteed by a single centrally optimized model.
  • Using a controlled hospital triage scenario with two negotiating agents across structured debate rounds, the study shows that an agent’s ethical “alignment” (via RAG to a chosen framework) strongly influences negotiation strategies and allocation outcomes.
  • It finds that neither agent achieves ethical adequacy on its own, but their combined final allocation can meet fairness criteria that neither would reach in isolation.
  • The authors observe that aligned agents partially reduce bias through contestation (corrective negotiation) rather than fully overriding the biased agent, and that even aligned agents retain intrinsic biases tied to framework preferences.
  • The results connect this behavior to Arrow’s Impossibility Theorem, suggesting that multi-agent deliberation can navigate unsatisfiable collective-choice constraints, and that fairness should be evaluated at the system/procedure level rather than per-agent.

Abstract

Fairness in language models is typically studied as a property of a single, centrally optimized model. As large language models become increasingly agentic, we propose that fairness emerges through interaction and exchange. We study this via a controlled hospital triage framework in which two agents negotiate over three structured debate rounds. One agent is aligned to a specific ethical framework via retrieval-augmented generation (RAG), while the other is either unaligned or adversarially prompted to favor demographic groups over clinical need. We find that alignment systematically shapes negotiation strategies and allocation patterns, and that neither agent's allocation is ethically adequate in isolation, yet their joint final allocation can satisfy fairness criteria that neither would have reached alone. Aligned agents partially moderate bias through contestation rather than override, acting as corrective patches that restore access for marginalized groups without fully converting a biased counterpart. We further observe that even explicitly aligned agents exhibit intrinsic biases toward certain frameworks, consistent with known left-leaning tendencies in LLMs. We connect these limits to Arrow's Impossibility Theorem: no aggregation mechanism can simultaneously satisfy all desiderata of collective rationality, and multi-agent deliberation navigates rather than resolves this constraint. Our results reposition fairness as an emergent, procedural property of decentralized agent interaction, and the system rather than the individual agent as the appropriate unit of evaluation.