Bridging MARL to SARL: An Order-Independent Multi-Agent Transformer via Latent Consensus

arXiv cs.AI / 4/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces Consensus Multi-Agent Transformer (CMAT), a centralized approach that bridges cooperative MARL into a hierarchical single-agent RL formulation using a Transformer to handle large joint observation spaces.
CMAT generates a high-level latent “consensus” vector via an autoregressive Transformer decoder, allowing agents to make order-independent joint decisions and improving robustness versus conventional action-sequence–sensitive multi-agent Transformers.
By conditioning simultaneous agent actions on the latent consensus, the method enables joint policy optimization using single-agent PPO while retaining coordinated behavior.
Experiments on StarCraft II, Multi-Agent MuJoCo, and Google Research Football show CMAT outperforming recent centralized methods, sequential MARL approaches, and standard MARL baselines.
The authors provide an open-source implementation of CMAT in a public GitHub repository, facilitating reproduction and further experimentation.

Abstract

Cooperative multi-agent reinforcement learning (MARL) is widely used to address large joint observation and action spaces by decomposing a centralized control problem into multiple interacting agents. However, such decomposition often introduces additional challenges, including non-stationarity, unstable training, weak coordination, and limited theoretical guarantees. In this paper, we propose the Consensus Multi-Agent Transformer (CMAT), a centralized framework that bridges cooperative MARL to a hierarchical single-agent reinforcement learning (SARL) formulation. CMAT treats all agents as a unified entity and employs a Transformer encoder to process the large joint observation space. To handle the extensive joint action space, we introduce a hierarchical decision-making mechanism in which a Transformer decoder autoregressively generates a high-level consensus vector, simulating the process by which agents reach agreement on their strategies in latent space. Conditioned on this consensus, all agents generate their actions simultaneously, enabling order-independent joint decision making and avoiding the sensitivity to action-generation order in conventional Multi-Agent Transformers (MAT). This factorization allows the joint policy to be optimized using single-agent PPO while preserving expressive coordination through the latent consensus. To evaluate the proposed method, we conduct experiments on benchmark tasks from StarCraft II, Multi-Agent MuJoCo, and Google Research Football. The results show that CMAT achieves superior performance over recent centralized solutions, sequential MARL methods, and conventional MARL baselines. The code for this paper is available at:https://github.com/RS2002/CMAT .