CROSS: A Mixture-of-Experts Reinforcement Learning Framework for Generalizable Large-Scale Traffic Signal Control

arXiv cs.RO / 3/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes CROSS, a decentralized reinforcement learning framework for adaptive traffic signal control designed to generalize across diverse, large-scale intersection topologies and traffic patterns.
  • CROSS uses a Mixture-of-Experts (MoE) approach, combining a shared policy with multiple scenario-adaptive experts to better capture varying traffic dynamics than single-policy RL methods.
  • It introduces a Predictive Contrastive Clustering (PCC) module that forecasts short-term state transitions and uses clustering plus contrastive learning to form more robust, pattern-level representations.
  • Experiments in the SUMO simulator on both synthetic and real-world datasets show that CROSS outperforms state-of-the-art baselines in both control performance and generalization to new scenarios.

Abstract

Recent advances in robotics, automation, and artificial intelligence have enabled urban traffic systems to operate with increasing autonomy towards future smart cities, powered in part by the development of adaptive traffic signal control (ATSC), which dynamically optimizes signal phases to mitigate congestion and optimize traffic. However, achieving effective and generalizable large-scale ATSC remains a significant challenge due to the diverse intersection topologies and highly dynamic, complex traffic demand patterns across the network. Existing RL-based methods typically use a single shared policy for all scenarios, whose limited representational capacity makes it difficult to capture diverse traffic dynamics and generalize to unseen environments. To address these challenges, we propose CROSS, a novel Mixture-of-Experts (MoE)-based decentralized RL framework for generalizable ATSC. We first introduce a Predictive Contrastive Clustering (PCC) module that forecasts short-term state transitions to identify latent traffic patterns, followed by clustering and contrastive learning to enhance pattern-level representation. We further design a Scenario-Adaptive MoE module that augments a shared policy with multiple experts, thus enabling adaptive specialization and more flexible scenario-specific strategies. We conduct extensive experiments in the SUMO simulator on both synthetic and real-world traffic datasets. Compared with state-of-the-art baselines, CROSS achieves superior performance and generalization through improved representation of diverse traffic scenarios.