CoFlow: Coordinated Few-Step Flow for Offline Multi-Agent Decision Making

arXiv cs.AI / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that offline multi-agent RL methods using generative models don’t necessarily need many iterative sampling steps to maintain inter-agent coordination.
  • It introduces Coordinated few-step Flow (CoFlow), which uses a natively joint-coupled velocity field via Coordinated Velocity Attention (CVA) plus Adaptive Coordination Gating to preserve coordination in single-pass multi-agent generation.
  • CoFlow replaces memory-heavy Jacobian-vector backpropagation with a finite-difference consistency surrogate implemented through two stop-gradient forward passes through the averaged velocity field.
  • Experiments across 60 configurations in MPE, MA-MuJoCo, and SMAC show CoFlow matches or outperforms multiple generative/flow/transformer baselines on episodic return.
  • Coordination-probe results and a denoising-step sweep indicate that performance gains come from improved inter-agent coordination, achieving state-of-the-art coordination quality in just 1–3 denoising steps with both centralized and decentralized execution.

Abstract

Generative models have emerged as a major paradigm for offline multi-agent reinforcement learning (MARL), but existing approaches require many iterative sampling steps. Recent few-step accelerations either distill a joint teacher into independent students or apply averaged velocities independently per agent, suggesting that few-step inference requires sacrificing inter-agent coordination. We show this trade-off is not necessary: single-pass multi-agent generation can preserve coordination when the velocity field is natively joint-coupled. We propose Coordinated few-step Flow (CoFlow), an architecture that combines Coordinated Velocity Attention (CVA) with Adaptive Coordination Gating. A finite-difference consistency surrogate further replaces memory-prohibitive Jacobian-vector product backpropagation through the averaged velocity field with two stop-gradient forward passes. Across 60 configurations spanning MPE, MA-MuJoCo, and SMAC, CoFlow matches or surpasses Gaussian / value-based, transformer, diffusion, and prior flow baselines on episodic return. Three independent coordination probes confirm that the gains flow through inter-agent coordination rather than per-agent capacity. A denoising-step sweep shows that single-pass inference suffices on every configuration. CoFlow reaches state-of-the-art coordination quality in 1-3 denoising steps under both centralized and decentralized execution. Project page: https://github.com/Guowei-Zou/coflow.