Equivariant Multi-agent Reinforcement Learning for Multimodal Vehicle-to-Infrastructure Systems

arXiv cs.LG / 4/9/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies a decentralized vehicle-to-infrastructure (V2I) setting where multiple road-side units (RSUs) collect multimodal data (wireless plus visual) from moving vehicles to jointly improve network performance.
  • It formulates the RSU resource optimization as a distributed multi-agent reinforcement learning (MARL) problem that incorporates rotation symmetries in vehicle positions to make policies equivariant.
  • The authors introduce a self-supervised learning framework at each base station that aligns latent multimodal features to infer vehicle positions locally from its own observations.
  • They train an equivariant policy using a graph neural network (GNN) with message passing, plus a signaling coordination scheme so agents can collaborate despite partial observability.
  • Simulation results using ray-tracing and graphics data show over two-fold accuracy gains for the sensing approach versus baselines and more than 50% performance improvements for the equivariant MARL training over standard methods.

Abstract

In this paper, we study a vehicle-to-infrastructure (V2I) system where distributed base stations (BSs) acting as road-side units (RSUs) collect multimodal (wireless and visual) data from moving vehicles. We consider a decentralized rate maximization problem, where each RSU relies on its local observations to optimize its resources, while all RSUs must collaborate to guarantee favorable network performance. We recast this problem as a distributed multi-agent reinforcement learning (MARL) problem, by incorporating rotation symmetries in terms of vehicles' locations. To exploit these symmetries, we propose a novel self-supervised learning framework where each BS agent aligns the latent features of its multimodal observation to extract the positions of the vehicles in its local region. Equipped with this sensing data at each RSU, we train an equivariant policy network using a graph neural network (GNN) with message passing layers, such that each agent computes its policy locally, while all agents coordinate their policies via a signaling scheme that overcomes partial observability and guarantees the equivariance of the global policy. We present numerical results carried out in a simulation environment, where ray-tracing and computer graphics are used to collect wireless and visual data. Results show the generalizability of our self-supervised and multimodal sensing approach, achieving more than two-fold accuracy gains over baselines, and the efficiency of our equivariant MARL training, attaining more than 50% performance gains over standard approaches.