M2-PALE: A Framework for Explaining Multi-Agent MCTS--Minimax Hybrids via Process Mining and LLMs

arXiv cs.AI / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes M2-PALE, a framework to explain how multi-agent Monte-Carlo Tree Search (MCTS) agents make decisions by combining MCTS with Minimax in the rollout phase.
  • It addresses a known weakness of standard MCTS—overly selective tree construction that can miss important moves and fall into tactical traps—by injecting shallow full-width Minimax search to deepen strategy.
  • To make the resulting decision logic understandable, the framework uses process mining (Alpha Miner, iDHM, and Inductive Miner) to extract behavioral workflows from agent execution traces.
  • Those extracted process models are then synthesized by LLMs to produce human-readable causal and distal explanations for end users.
  • The approach is validated in a small-scale checkers environment, with the authors claiming it provides a scalable basis for interpreting hybrid agents in more complex strategic domains.

Abstract

Monte-Carlo Tree Search (MCTS) is a fundamental sampling-based search algorithm widely used for online planning in sequential decision-making domains. Despite its success in driving recent advances in artificial intelligence, understanding the behavior of MCTS agents remains a challenge for both developers and users. This difficulty stems from the complex search trees produced through the simulation of numerous future states and their intricate relationships. A known weakness of standard MCTS is its reliance on highly selective tree construction, which may lead to the omission of crucial moves and a vulnerability to tactical traps. To resolve this, we incorporate shallow, full-width Minimax search into the rollout phase of multi-agent MCTS to enhance strategic depth. Furthermore, to demystify the resulting decision-making logic, we introduce \textsf{M2-PALE} (MCTS--Minimax Process-Aided Linguistic Explanations). This framework employs process mining techniques, specifically the Alpha Miner, iDHM, and Inductive Miner algorithms, to extract underlying behavioral workflows from agent execution traces. These process models are then synthesized by LLMs to generate human-readable causal and distal explanations. We demonstrate the efficacy of our approach in a small-scale checkers environment, establishing a scalable foundation for interpreting hybrid agents in increasingly complex strategic domains.