Hear Both Sides: Efficient Multi-Agent Debate via Diversity-Aware Message Retention

arXiv cs.CL / 3/24/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that standard multi-agent debate, which broadcasts every agent’s message each round, adds noise and redundancy that can hurt reasoning quality and waste compute.
  • It proposes Diversity-Aware Retention (DAR), which retains and broadcasts only a subset of agent responses chosen to maximally disagree with each other and with the majority vote.
  • DAR uses an index-based message retention mechanism that forwards original, unmodified agent messages to keep retained disagreements authentic.
  • Experiments on multiple reasoning and question-answering benchmarks show that selective propagation improves debate performance and benefits most as the number of agents increases.
  • The work emphasizes that in multi-agent LLM systems, controlling what agents “hear” can be as important as the content they generate.

Abstract

Multi-Agent Debate has emerged as a promising framework for improving the reasoning quality of large language models through iterative inter-agent communication. However, broadcasting all agent messages at every round introduces noise and redundancy that can degrade debate quality and waste computational resources. Current approaches rely on uncertainty estimation to filter low-confidence responses before broadcasting, but this approach is unreliable due to miscalibrated confidence scores and sensitivity to threshold selection. To address this, we propose Diversity-Aware Retention (DAR), a lightweight debate framework that, at each debate round, selects the subset of agent responses that maximally disagree with each other and with the majority vote before broadcasting. Through an explicit index-based retention mechanism, DAR preserves the original messages without modification, ensuring that retained disagreements remain authentic. Experiments on diverse reasoning and question answering benchmarks demonstrate that our selective message propagation consistently improves debate performance, particularly as the number of agents scales, where noise accumulation is most severe. Our results highlight that what agents hear is as important as what agents say in multi-agent reasoning systems.