SignalClaw: LLM-Guided Evolutionary Synthesis of Interpretable Traffic Signal Control Skills

arXiv cs.AI / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • SIGNALCLAW is a framework that uses large language models to generate and evolve interpretable traffic-signal control skills, addressing the opacity of RL and the rigidity of program synthesis languages.
  • Each evolved skill is self-documenting, including human-readable rationale, selection guidance, and executable code so traffic engineers can inspect and modify policies directly.
  • Evolution is guided by simulation-derived metrics (e.g., queue percentiles, delay trends, and stagnation), which are converted into natural-language feedback for iterative improvement.
  • The system adds event-driven compositional evolution using a detector (via TraCI) and a dispatcher that selects specialized skills for emergency vehicles, transit priority, incidents, and congestion, enabling runtime composition without retraining.
  • In SUMO evaluations, SIGNALCLAW matches or approaches best performance on routine scenarios and substantially reduces emergency and transit delays in event-injected scenarios versus MaxPressure and DQN, while keeping low variance and stable mixed-event performance.

Abstract

Traffic signal control TSC requires strategies that are both effective and interpretable for deployment, yet reinforcement learning produces opaque neural policies while program synthesis depends on restrictive domain-specific languages. We present SIGNALCLAW, a framework that uses large language models LLMs as evolutionary skill generators to synthesize and refine interpretable control skills for adaptive TSC. Each skill includes rationale, selection guidance, and executable code, making policies human-inspectable and self-documenting. At each generation, evolution signals from simulation metrics such as queue percentiles, delay trends, and stagnation are translated into natural language feedback to guide improvement. SignalClaw also introduces event-driven compositional evolution: an event detector identifies emergency vehicles, transit priority, incidents, and congestion via TraCI, and a priority dispatcher selects specialized skills. Each skill is evolved independently, and a priority chain enables runtime composition without retraining. We evaluate SignalClaw on routine and event-injected SUMO scenarios against four baselines. On routine scenarios, it achieves average delay of 7.8 to 9.2 seconds, within 3 to 10 percent of the best method, with low variance across random seeds. Under event scenarios, it yields the lowest emergency delay 11.2 to 18.5 seconds versus 42.3 to 72.3 for MaxPressure and 78.5 to 95.3 for DQN, and the lowest transit person delay 9.8 to 11.5 seconds versus 38.7 to 45.2 for MaxPressure. In mixed events, the dispatcher composes skills effectively while maintaining stable overall delay. The evolved skills progress from simple linear rules to conditional strategies with multi-feature interactions, while remaining fully interpretable and directly modifiable by traffic engineers.