Multi-agent Reach-avoid MDP via Potential Games and Low-rank Policy Structure

arXiv cs.RO / 4/10/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • Numerical experiments indicate substantially lower peak memory and offline computation complexity across different MDPs and agent counts, with approximation error to the global objective remaining comparatively small.

Abstract

We optimize finite horizon multi-agent reach-avoid Markov decision process (MDP) via \emph{local feedback policies}. The global feedback policy solution yields global optimality but its communication complexity, memory usage and computation complexity scale exponentially with the number of agents. We mitigate this exponential dependency by restricting the solution space to local feedback policies and show that local feedback policies are rank-one factorizations of global feedback policies, which provides a principled approach to reducing communication complexity and memory usage. Additionally, by demonstrating that multi-agent reach-avoid MDPs over local feedback policies has a potential game structure, we show that iterative best response is a tractable multi-agent learning scheme with guaranteed convergence to deterministic Nash equilibrium, and derive each agent's best response via multiplicative dynamic program (DP) over the joint state space. Numerical simulations across different MDPs and agent sets show that the peak memory usage and offline computation complexity are significantly reduced while the approximation error to the optimal global reach-avoid objective is maintained.