The Consensus Trap: Rescuing Multi-Agent LLMs from Adversarial Majorities via Token-Level Collaboration

arXiv cs.CL / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper shows that response-level aggregation in multi-agent LLMs (e.g., Majority Voting) is structurally vulnerable to adversarial prompt injections when corrupted agents can form a local majority.
It argues that majority voting fails because it aggregates fully formed responses and cannot detect or correct flawed intermediate reasoning produced by corrupted agents.
The authors propose Token-Level Round-Robin (RR) Collaboration, where agents alternately generate tokens in a shared autoregressive context to interleave logic.
Using a dynamical-systems framing, they prove that token-level interleaving changes aggregation from a brittle linear vote-sum into a non-linear operator product, enabling honest agents to “pull back” against adversarial corruption.
Extensive experiments across reasoning benchmarks find that MAJ collapses once corrupted agents exceed a threshold, while RR keeps strong accuracy beyond that point.

Abstract

Multi-agent large language model (LLM) architectures increasingly rely on response-level aggregation, such as Majority Voting (MAJ), to raise reasoning ceilings. However, in open environments, agents are highly susceptible to stealthy contextual corruption, such as targeted prompt injections. We reveal a critical structural vulnerability in current multi-agent systems: response-level aggregation collapses when corrupted agents form a local majority. Because voting aggregates fully-formed conclusions, it is blind to flawed intermediate logic. To overcome this systematic limitation, we propose the Token-Level Round-Robin (RR) Collaboration, where agents sequentially interleave generation within a shared auto-regressive context. We formalize this process as a discrete-time dynamical system, proving that token-level interleaving transitions aggregation from a brittle counting of final votes (a linear sum) to a dynamic, interwoven chain of logic (a non-linear operator product). Through this theoretical lens, we prove that the honest model's restorative pull can overpower adversarial corruptions, even when corrupted agents form a majority. We conduct an exhaustive empirical evaluation across diverse reasoning benchmarks and demonstrate that while MAJ collapses when corrupted agents reach a majority, RR maintains robust accuracy well beyond this critical threshold.

Every time a new model comes out, the old one is obsolete of course

Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

Dev.to

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims

Dev.to

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM

Reddit r/LocalLLaMA

The Consensus Trap: Rescuing Multi-Agent LLMs from Adversarial Majorities via Token-Level Collaboration

Key Points

Abstract

Related Articles

Every time a new model comes out, the old one is obsolete of course

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer