AID: Agent Intent from Diffusion for Multi-Agent Informative Path Planning

arXiv cs.RO / 5/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper tackles multi-agent informative path planning (MAIPP), where agents must coordinate to maximize information gain under time/budget constraints as the environment belief updates with new measurements.
It identifies limitations of prior learning-based coordination methods that use autoregressive “intent” predictors, noting they are computationally expensive and can suffer from compounding errors.
The authors propose AID, a fully decentralized MAIPP framework that uses diffusion models to generate long-horizon trajectories in a non-autoregressive way, improving coordination efficiency.
AID is trained in two stages: behavior cloning from trajectories produced by existing MAIPP planners, followed by reinforcement learning using Diffusion Policy Policy Optimization (DPPO) with online reward feedback.
Experiments show AID outperforms the baseline MAIPP planners it is trained from, delivering up to 4× faster execution and up to 17% higher information gain while scaling to larger agent teams, and the code is released publicly.

Abstract

Information gathering in large-scale or time-critical scenarios (e.g., environmental monitoring, search and rescue) requires broad coverage within limited time budgets, motivating the use of multi-agent systems. These scenarios are commonly formulated as multi-agent informative path planning (MAIPP), where multiple agents must coordinate to maximize information gain while operating under budget constraints. A central challenge in MAIPP is ensuring effective coordination while the belief over the environment evolves with incoming measurements. Recent learning-based approaches address this by using distributions over future positions as "intent" to support coordination. However, these autoregressive intent predictors are computationally expensive and prone to compounding errors. Inspired by the effectiveness of diffusion models as expressive, long-horizon policies, we propose AID, a fully decentralized MAIPP framework that leverages diffusion models to generate long-term trajectories in a non-autoregressive manner. AID first performs behavior cloning on trajectories produced by existing MAIPP planners and then fine-tunes the policy using reinforcement learning via Diffusion Policy Policy Optimization (DPPO). This two-stage pipeline enables the policy to inherit expert behavior while learning improved coordination through online reward feedback. Experiments demonstrate that AID consistently improves upon the MAIPP planners it is trained from, achieving 4x faster execution and up to 17% increased information gain, while scaling effectively to larger numbers of agents. Our implementation is publicly available at https://github.com/marmotlab/AID.