AI Navigate

A Robust and Efficient Multi-Agent Reinforcement Learning Framework for Traffic Signal Control

arXiv cs.AI / 3/13/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper presents a robust multi-agent reinforcement learning framework for traffic signal control validated in the Vissim simulator.
  • It introduces Turning Ratio Randomization to train agents under dynamic turning probabilities, improving robustness to unseen traffic scenarios.
  • It proposes a stability-oriented Exponential Phase Duration Adjustment action space that balances responsiveness and precision via cyclical exponential phase adjustments.
  • It uses a Neighbor-Based Observation scheme with MAPPO and Centralized Training with Decentralized Execution to achieve scalable coordination while leveraging centralized updates, and reports over 10% reduction in average waiting time with better generalization.

Abstract

Reinforcement Learning (RL) in Traffic Signal Control (TSC) faces significant hurdles in real-world deployment due to limited generalization to dynamic traffic flow variations. Existing approaches often overfit static patterns and use action spaces incompatible with driver expectations. This paper proposes a robust Multi-Agent Reinforcement Learning (MARL) framework validated in the Vissim traffic simulator. The framework integrates three mechanisms: (1) Turning Ratio Randomization, a training strategy that exposes agents to dynamic turning probabilities to enhance robustness against unseen scenarios; (2) a stability-oriented Exponential Phase Duration Adjustment action space, which balances responsiveness and precision through cyclical, exponential phase adjustments; and (3) a Neighbor-Based Observation scheme utilizing the MAPPO algorithm with Centralized Training with Decentralized Execution (CTDE). By leveraging centralized updates, this approach approximates the efficacy of global observations while maintaining scalable local communication. Experimental results demonstrate that our framework outperforms standard RL baselines, reducing average waiting time by over 10%. The proposed model exhibits superior generalization in unseen traffic scenarios and maintains high control stability, offering a practical solution for adaptive signal control.