DiscoUQ: Structured Disagreement Analysis for Uncertainty Quantification in LLM Agent Ensembles
arXiv cs.CL / 3/24/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces DiscoUQ, a framework for quantifying uncertainty in LLM agent ensembles by modeling structured inter-agent disagreement rather than relying on shallow vote statistics.
- DiscoUQ extracts semantic disagreement signals from agents’ reasoning (e.g., evidence overlap, argument strength, divergence depth) and augments them with embedding-geometry features (e.g., cluster distances and dispersion).
- It presents three progressively complex variants—DiscoUQ-LLM, DiscoUQ-Embed, and DiscoUQ-Learn—that use logistic regression and a neural network to produce calibrated confidence estimates.
- On four benchmarks (StrategyQA, MMLU, TruthfulQA, ARC-Challenge) using a 5-agent setup with Qwen3.5-27B, DiscoUQ-LLM improves AUROC to 0.802 versus 0.791 for the best baseline while achieving better calibration (ECE 0.036 vs. 0.098).
- The approach shows strong cross-benchmark generalization and delivers the biggest gains in ambiguous cases where agents exhibit “weak disagreement” and vote counting underperforms.
Related Articles
5 Signs Your Consulting Firm Needs AI Agents (Not More Staff)
Dev.to
AgentDesk vs Hiring Another Consultant: A Cost Comparison
Dev.to
"Why Your AI Agent Needs a System 1"
Dev.to
When should we expect TurboQuant?
Reddit r/LocalLLaMA
AI as Your Customs Co-Pilot: Automating HS Code Chaos in Southeast Asia
Dev.to