Taming the Curses of Multiagency in Robust Markov Games with Large State Space through Linear Function Approximation

arXiv cs.LG / 5/6/2026

📰 NewsModels & Research

Key Points

  • The paper studies distributionally robust Markov games in multi-agent reinforcement learning, aiming to optimize worst-case performance under uncertainty in the environment model.
  • It targets the “curse of multiagency,” where data efficiency collapses as the joint state/action space grows rapidly with the number of agents, and notes that existing provably efficient approaches are mostly limited to small tabular settings.
  • The authors extend robust Markov game learning to large (possibly infinite) state spaces by using linear function approximation and designing provably data-efficient algorithms.
  • For uncertainty sets defined via total variation distance, the paper provides guarantees in both a generative model setting and a new online interactive setting, showing sample complexity that breaks the curse of multiagency.
  • The work claims, to the authors’ knowledge, the first results achieving improved sample complexity for robust Markov games with large state spaces independent of how the uncertainty set is constructed.

Abstract

Multi-agent reinforcement learning (MARL) holds great potential but faces robustness challenges due to environmental uncertainty. To address this, distributionally robust Markov games (RMGs) optimize worst-case performance when the environment deviates from the nominal model within a uncertainty set. Beyond robustness, an equally urgent goal for MARL is data efficiency -- sampling from vast state and action spaces that grow exponentially with the number of agents potentially leads to the curse of multiagency. However, current provably data-efficient algorithms for RMGs are limited to tabular settings with finite state and action spaces, which are only computationally manageable for small-scale problems, leaving RMGs with large-scale (or infinite) state spaces largely unexplored. The only existing work beyond tabular settings focuses on linear function approximation (LFA) for a restrictive class of RMGs using vanish minimal value assumption and still suffers from sample complexity with the curse of multiagency. In this work, we focuses on general RMGs with LFA. For uncertainty sets defined by total variation distance, we develop provably data-efficient algorithms that break the curse of multiagency in both the generative model setting and a newly proposed online interactive setting. To our knowledge, our results are the first to break the curse of multiagency of sample complexity for RMGs with large (possibly infinite) state spaces, regardless of the uncertainty set construction.