Preserving Disagreement: Architectural Heterogeneity and Coherence Validation in Multi-Agent Policy Simulation

arXiv cs.AI / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper identifies an “artificial consensus” problem in LLM-based multi-agent policy simulation, where evaluator agents converge on the same option despite differing value perspectives.
  • It proposes the AI Council, a three-phase multi-agent deliberation framework, and evaluates it via 120 deliberations across two policy scenarios.
  • Architectural heterogeneity—assigning different 7–9B parameter models to different value perspectives—substantially reduces first-choice concentration versus a homogeneous baseline.
  • Coherence validation—using a frontier model to judge whether each evaluator’s reasoning aligns with its assigned values—introduces a fidelity–diversity tradeoff that can either further reduce or unexpectedly increase convergence depending on the scenario.
  • The authors also report multiple negative results (failed Delphi variants), find that 8B models respond to counter-arguments in a binary (not graded) way, and introduce “trustworthy tension rate” as a diagnostic for small-model deliberation.

Abstract

Multi-agent deliberation systems using large language models (LLMs) are increasingly proposed for policy simulation, yet they suffer from artificial consensus: evaluator agents converge on the same option regardless of their assigned value perspectives. We present the AI Council, a three-phase deliberation framework, and conduct 120 deliberations across two policy scenarios to test two interventions. First, architectural heterogeneity (assigning a different 7-9B parameter model to each value perspective) significantly reduces first-choice concentration compared to a homogeneous baseline (child welfare: 70.9% to 46.1%, p < 0.001, r = 0.58; housing: 46.0% to 22.9%, p < 0.001, r = 0.50). This contrasts with accuracy-oriented multi-agent debate, where heterogeneity does not reduce convergence, suggesting model diversity operates differently when no objectively correct answer exists. Second, coherence validation (using a frontier model to assess whether each evaluator's reasoning is grounded in its assigned values) reveals a fidelity-diversity tradeoff: on a scenario with a dominant option, it further reduces concentration (46.1% to 40.8%, p = 0.004), but on a scenario with genuinely competitive options, it increases concentration (22.9% to 26.6%, p = 0.96) by amplifying high-coherence evaluators who cluster on one option. This tradeoff may be a general property of multi-agent systems employing quality weighting. We report negative results from three failed Delphi designs, demonstrate that 8B models exhibit binary rather than graded responses to counter-arguments, and propose the trustworthy tension rate as a diagnostic measure of small-model deliberation capabilities.