Cooperative Informative Sensing for Monitoring Dynamic Indoor Environments via Multi-Agent Reinforcement Learning

arXiv cs.RO / 4/28/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses monitoring human activity in dynamic indoor environments and argues that traditional multi-robot objective functions (e.g., coverage/visitation) do not closely match human-centric accuracy needs.
  • It formulates cooperative active observation as a decentralized control problem under partial observability, where robots choose motions to directly optimize monitoring accuracy.
  • The authors propose a learning-based MARL framework that trains cooperative policies from decentralized observations, including an architecture designed to handle variable numbers of humans and temporal dependencies.
  • Simulation experiments across multiple indoor settings and monitoring tasks show consistent improvements over classical coverage, persistent monitoring, and non-learning baselines, with robustness to changes in how many humans are observed.

Abstract

Monitoring human activity in indoor environments is important for applications such as facility management, safety assessment, and space utilization analysis. While mobile robot teams offer the potential to actively improve observation quality, existing multi-robot monitoring and active perception approaches typically rely on coverage or visitation based objectives that are weakly aligned with the accuracy requirements of human-centric monitoring tasks. In this work, we formulate cooperative active observation as a decentralized control problem in which multiple robots adjust their motion to directly optimize monitoring accuracy under partial observability. We propose a learning-based framework for cooperative policies from decentralized observations using multi-agent reinforcement learning (MARL), supported by an architecture that handles variable numbers of humans and temporal dependencies. Simulation results across diverse indoor environments and monitoring tasks show that the proposed approach consistently outperforms classical coverage, persistent monitoring, and learning-free multi-robot baselines, while remaining robust to changes in the number of observed humans.