TrafficClaw: Generalizable Urban Traffic Control via Unified Physical Environment Modeling

arXiv cs.AI / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper argues that urban traffic control needs system-level generalization by using a unified physical environment that couples heterogeneous subsystems (signals, freeways, transit, taxis) instead of treating them as isolated tasks.
  • It proposes TrafficClaw, which integrates multiple traffic subsystems into a shared dynamical runtime to model cross-subsystem interactions and enable closed-loop feedback between agents and the environment.
  • TrafficClaw builds an LLM-based agent with executable spatiotemporal reasoning and reusable procedural memory to perform unified diagnostics across subsystems and iteratively refine strategies.
  • The approach includes a multi-stage training pipeline combining supervised initialization with agentic reinforcement learning plus system-level optimization, aiming for coordinated and system-aware performance.
  • Experiments reportedly show robust, transferable, and system-aware results on previously unseen traffic scenarios, dynamics, and task configurations, and the project is released on GitHub.

Abstract

Urban traffic control is a system-level coordination problem spanning heterogeneous subsystems, including traffic signals, freeways, public transit, and taxi services. Existing optimization-based, reinforcement learning (RL), and emerging LLM-based approaches are largely designed for isolated tasks, limiting both cross-task generalization and the ability to capture coupled physical dynamics across subsystems. We argue that effective system-level control requires a unified physical environment in which subsystems share infrastructure, mobility demand, and spatiotemporal constraints, allowing local interventions to propagate through the network. To this end, we propose TrafficClaw, a framework for general urban traffic control built upon a unified runtime environment. TrafficClaw integrates heterogeneous subsystems into a shared dynamical system, enabling explicit modeling of cross-subsystem interactions and closed-loop agent-environment feedback. Within this environment, we develop an LLM agent with executable spatiotemporal reasoning and reusable procedural memory, supporting unified diagnostics across subsystems and continual strategy refinement. Furthermore, we introduce a multi-stage training pipeline with supervised initialization and agentic RL with system-level optimization, further enabling coordinated and system-aware performance. Experiments demonstrate that TrafficClaw achieves robust, transferable, and system-aware performance across unseen traffic scenarios, dynamics, and task configurations. Our project is available at https://github.com/usail-hkust/TrafficClaw.