Every Response Counts: Quantifying Uncertainty of LLM-based Multi-Agent Systems through Tensor Decomposition

arXiv cs.AI / 4/13/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper highlights reliability risks in LLM-based multi-agent systems (MAS) that emerge from communication dynamics and role dependencies, which create uncertainty beyond what single-agent uncertainty methods can capture.
  • It identifies three MAS-specific challenges for uncertainty quantification: cascading uncertainty across multi-step reasoning, variability in inter-agent communication paths, and diversity of communication topologies.
  • The authors propose MATU, a framework that quantifies uncertainty by encoding full reasoning trajectories as embedding matrices and aggregating multiple execution runs into a higher-order tensor.
  • Using tensor decomposition, MATU disentangles and measures distinct uncertainty sources, producing a more comprehensive reliability metric that can generalize across different agent structures.
  • The work reports extensive experiments showing MATU improves holistic and robust uncertainty estimation across diverse tasks and communication topologies.

Abstract

While Large Language Model-based Multi-Agent Systems (MAS) consistently outperform single-agent systems on complex tasks, their intricate interactions introduce critical reliability challenges arising from communication dynamics and role dependencies. Existing Uncertainty Quantification methods, typically designed for single-turn outputs, fail to address the unique complexities of the MAS. Specifically, these methods struggle with three distinct challenges: the cascading uncertainty in multi-step reasoning, the variability of inter-agent communication paths, and the diversity of communication topologies. To bridge this gap, we introduce MATU, a novel framework that quantifies uncertainty through tensor decomposition. MATU moves beyond analyzing final text outputs by representing entire reasoning trajectories as embedding matrices and organizing multiple execution runs into a higher-order tensor. By applying tensor decomposition, we disentangle and quantify distinct sources of uncertainty, offering a comprehensive reliability measure that is generalizable across different agent structures. We provide comprehensive experiments to show that MATU effectively estimates holistic and robust uncertainty across diverse tasks and communication topologies.