ProMAS: Proactive Error Forecasting for Multi-Agent Systems Using Markov Transition Dynamics

arXiv cs.AI / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • ProMAS is a proposed proactive error-forecasting framework for multi-agent systems with LLMs, aiming to intervene in real time rather than relying on post-hoc failure analysis.
  • The method uses “Causal Delta Features” to quantify semantic displacement, maps them into a quantized Vector Markov Space, and models reasoning as probabilistic Markov transitions.
  • By combining a Proactive Prediction Head with Jump Detection, ProMAS localizes errors based on risk acceleration instead of static thresholds, targeting lower intervention latency.
  • On the Who&When benchmark, ProMAS reportedly achieves 22.97% step-level accuracy while using only 27% of reasoning logs, reducing data overhead by 73% while performing comparably to reactive monitors like MASC.
  • The authors note an accuracy trade-off versus post-hoc methods, but argue the approach better balances diagnostic precision with the real-time needs of autonomous multi-agent reasoning.

Abstract

The integration of Large Language Models into Multi-Agent Systems (MAS) has enabled the so-lution of complex, long-horizon tasks through collaborative reasoning. However, this collec-tive intelligence is inherently fragile, as a single logical fallacy can rapidly propagate and lead to system-wide failure. Most current research re-lies on post-hoc failure analysis, thereby hinder-ing real-time intervention. To address this, we propose PROMAS, a proactive framework utiliz-ing Markov transitions for predictive error anal-ysis. PROMAS extracts Causal Delta Features to capture semantic displacement, mapping them to a quantized Vector Markov Space to model reasoning as probabilistic transitions. By inte-grating a Proactive Prediction Head with Jump Detection, the method localizes errors via risk acceleration rather than static thresholds. On the Who&When benchmark, PROMAS achieves 22.97% step-level accuracy while processing only 27% of reasoning logs. This performance rivals reactive monitors like MASC while reducing data overhead by 73%. Although this strategy entails an accuracy trade-off compared to post-hoc meth-ods, it significantly improves intervention latency, balancing diagnostic precision with the real-time demands of autonomous reasoning.