On the Reliability Limits of LLM-Based Multi-Agent Planning

arXiv stat.ML / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper analyzes reliability limits of LLM-based multi-agent planning by modeling it as a finite acyclic decision network where agents share context, communicate via limited-capacity language channels, and may require human review.
  • It proves that without additional (exogenous) signals, any delegated multi-agent network is decision-theoretically dominated by a centralized Bayes decision maker with the same information.
  • In a common-evidence setting, the authors show that optimizing multi-agent directed acyclic graphs with a finite communication budget can be reframed as selecting a budget-constrained stochastic experiment over the shared signal.
  • The work quantifies how communication and information compression reduce decision quality, and expresses the centralized-vs-communicated performance gap via expected posterior divergence under proper scoring rules.
  • Experiments with LLMs on a controlled benchmark are used to validate the theoretical characterizations of reliability loss from delegation and compression.

Abstract

This technical note studies the reliability limits of LLM-based multi-agent planning as a delegated decision problem. We model the LLM-based multi-agent architecture as a finite acyclic decision network in which multiple stages process shared model-context information, communicate through language interfaces with limited capacity, and may invoke human review. We show that, without new exogenous signals, any delegated network is decision-theoretically dominated by a centralized Bayes decision maker with access to the same information. In the common-evidence regime, this implies that optimizing over multi-agent directed acyclic graphs under a finite communication budget can be recast as choosing a budget-constrained stochastic experiment on the shared signal. We also characterize the loss induced by communication and information compression. Under proper scoring rules, the gap between the centralized Bayes value and the value after communication admits an expected posterior divergence representation, which reduces to conditional mutual information under logarithmic loss and to expected squared posterior error under the Brier score. These results characterize the fundamental reliability limits of delegated LLM planning. Experiments with LLMs on a controlled problem set further demonstrate these characterizations.