CuraLight: Debate-Guided Data Curation for LLM-Centered Traffic Signal Control

arXiv cs.AI / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces CuraLight, an LLM-centered framework for traffic signal control that uses an RL agent to explore traffic environments and collect high-quality interaction trajectories for training data.
  • CuraLight converts the RL-generated trajectories into prompt-response pairs and applies imitation fine-tuning, aiming to improve interpretability and reduce the need for large amounts of interaction data.
  • It adds a multi-LLM ensemble “deliberation” mechanism that uses structured debate to evaluate candidate signal timing actions and produce preference-aware supervision signals.
  • Experiments in SUMO across heterogeneous networks (Jinan, Hangzhou, Yizhuang) show consistent performance gains over state-of-the-art baselines, including 5.34% lower average travel time, 5.14% shorter average queue length, and 7.02% reduced waiting time.
  • The study argues that combining RL-assisted exploration with debate-based data curation can yield scalable and more interpretable LLM-driven traffic signal strategies that generalize better across varied intersections.

Abstract

Traffic signal control (TSC) is a core component of intelligent transportation systems (ITS), aiming to reduce congestion, emissions, and travel time. Recent approaches based on reinforcement learning (RL) and large language models (LLMs) have improved adaptivity, but still suffer from limited interpretability, insufficient interaction data, and weak generalization to heterogeneous intersections. This paper proposes CuraLight, an LLM-centered framework where an RL agent assists the fine-tuning of an LLM-based traffic signal controller. The RL agent explores traffic environments and generates high-quality interaction trajectories, which are converted into prompt-response pairs for imitation fine-tuning. A multi-LLM ensemble deliberation system further evaluates candidate signal timing actions through structured debate, providing preference-aware supervision signals for training. Experiments conducted in SUMO across heterogeneous real-world networks from Jinan, Hangzhou, and Yizhuang demonstrate that CuraLight consistently outperforms state-of-the-art baselines, reducing average travel time by 5.34 percent, average queue length by 5.14 percent, and average waiting time by 7.02 percent. The results highlight the effectiveness of combining RL-assisted exploration with deliberation-based data curation for scalable and interpretable traffic signal control.