Representational Collapse in Multi-Agent LLM Committees: Measurement and Diversity-Aware Consensus

arXiv cs.LG / 4/7/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Multi-agent LLM “committees” that reuse the same underlying model with different role prompts can suffer from representational collapse, where agents’ chain-of-thought rationales become overly similar despite majority-vote aggregation.
Using three Qwen2.5-14B agents on 100 GSM8K problems, the study finds high mean pairwise cosine similarity (0.888) and low effective rank (2.17/3), indicating reduced diversity among agents.
The paper proposes DALC, a training-free diversity-aware consensus protocol that computes diversity weights from embedding-geometry, improving performance to 87% on GSM8K versus 84% for self-consistency while cutting token cost by 26%.
Ablations show that hint sharing often matters more than diversity weighting alone, run-to-run variance can reach 1–3 points per protocol, and the embedding/encoder choice can materially change collapse severity and downstream accuracy.

Abstract

Multi-agent LLM committees replicate the same model under different role prompts and aggregate outputs by majority vote, implicitly assuming that agents contribute complementary evidence. We embed each agent's chain-of-thought rationale and measure pairwise similarity: across 100 GSM8K questions with three Qwen2.5-14B agents, mean cosine similarity is 0.888 and effective rank is 2.17 out of 3.0, a failure mode we term representational collapse. DALC, a training-free consensus protocol that computes diversity weights from embedding geometry, reaches 87% on GSM8K versus 84% for self-consistency at 26% lower token cost. Ablation experiments reveal 1-3 point per-protocol run-to-run variance, confirm that hint sharing contributes more than diversity weighting alone, and show that encoder choice strongly modulates collapse severity (cosine 0.908 with mxbai versus 0.888 with nomic) and downstream accuracy. The more robust finding is that collapse is measurable, worsens on harder tasks, and that the choice of embedding proxy is a first-order design decision for any latent communication protocol.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/7DailyView insight →

Black Hat Asia

AI Business

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter

TechCrunch

Why Anthropic’s new model has cybersecurity experts rattled

Reddit r/artificial

Does the AI 2027 paper still hold any legitimacy?

Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)

Dev.to

Representational Collapse in Multi-Agent LLM Committees: Measurement and Diversity-Aware Consensus

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat Asia

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter

Why Anthropic’s new model has cybersecurity experts rattled

Does the AI 2027 paper still hold any legitimacy?

Why Most Productivity Systems Fail (And What to Do Instead)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer