The Consensus Trap: Rescuing Multi-Agent LLMs from Adversarial Majorities via Token-Level Collaboration
arXiv cs.CL / 4/21/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper shows that response-level aggregation in multi-agent LLMs (e.g., Majority Voting) is structurally vulnerable to adversarial prompt injections when corrupted agents can form a local majority.
- It argues that majority voting fails because it aggregates fully formed responses and cannot detect or correct flawed intermediate reasoning produced by corrupted agents.
- The authors propose Token-Level Round-Robin (RR) Collaboration, where agents alternately generate tokens in a shared autoregressive context to interleave logic.
- Using a dynamical-systems framing, they prove that token-level interleaving changes aggregation from a brittle linear vote-sum into a non-linear operator product, enabling honest agents to “pull back” against adversarial corruption.
- Extensive experiments across reasoning benchmarks find that MAJ collapses once corrupted agents exceed a threshold, while RR keeps strong accuracy beyond that point.
Related Articles

Every time a new model comes out, the old one is obsolete of course
Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆
Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)
Dev.to

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims
Dev.to

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM
Reddit r/LocalLLaMA