NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search

arXiv cs.LG / 5/4/2026

📰 NewsModels & Research

Key Points

  • Multi-agent Monte Carlo Tree Search (MCTS) often explores joint actions inefficiently because the number of possible joint actions grows exponentially, limiting performance under practical search budgets.
  • NonZero addresses this by replacing direct exploration of the full joint-action space with surrogate-guided selection over a low-dimensional nonlinear representation.
  • The method uses an interaction-aware proposal rule: it ranks single-agent deviations by predicted gain and scores two-agent deviations with a mixed-difference interaction metric to capture coordination benefits.
  • NonZero formulates candidate proposals as a bandit problem over local deviations and provides a sublinear local-regret guarantee for reaching approximate graph-local optima without enumerating joint actions.
  • Experiments on MatGame, SMAC, and SMACv2 show improved sample efficiency and final performance compared with strong model-based and model-free baselines under matched search budgets.

Abstract

Monte Carlo Tree Search (MCTS) scales poorly in cooperative multi-agent domains because expansion must consider an exponentially large set of joint actions, severely limiting exploration under realistic search budgets. We propose NonZero, which keeps multi-agent MCTS tractable by running surrogate-guided selection over a low-dimensional nonlinear representation using an interaction-guided proposal rule, instead of directly exploring the full joint-action space. Our exploration uses an interaction score: single-agent deviations are ranked by predicted gain, while two-agent deviations are scored by a mixed-difference measure that reveals coordination benefits even when no single agent can improve alone. We formalize candidate proposal as a bandit problem over local deviations and derive a proposal rule, NonZero, with a sublinear local-regret guarantee for reaching approximate graph-local optima without enumerating the joint-action space. Empirically, NonZero improves sample efficiency and final performance on MatGame, SMAC, and SMACv2 relative to strong model-based and model-free baselines under matched search budgets.