Emergency Preemption Without Online Exploration: A Decision Transformer Approach

arXiv cs.AI / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a Decision Transformer (DT) and return-conditioned sequence modeling approach for emergency vehicle corridor optimization that avoids any online environment interaction during training.
  • It introduces a single target-return scalar to provide dispatch-level urgency control, allowing smooth tradeoffs between emergency vehicle travel time and civilian delay without retraining.
  • In LightSim experiments on a 4x4 grid, the DT approach reduces average emergency vehicle travel time by 37.7% versus fixed-timing preemption and achieves the lowest civilian delay and fewest EV stops among compared methods.
  • The extension to multi-agent settings (Multi-Agent Decision Transformer with graph attention) further improves performance on larger 8x8 grids, delivering a 45.2% travel-time reduction.
  • A Constrained DT variant adds an explicit civilian disruption budget as a second control parameter to make the time-delay tradeoff more controllable.
  • Point 5

Abstract

Emergency vehicle (EV) response time is a critical determinant of survival outcomes, yet deployed signal preemption strategies remain reactive and uncontrollable. We propose a return-conditioned framework for emergency corridor optimization based on the Decision Transformer (DT). By casting corridor optimization as offline, return-conditioned sequence modeling, our approach (1) eliminates online environment interaction during policy learning, (2) enables dispatch-level urgency control through a single target-return scalar, and (3) extends to multi-agent settings via a Multi-Agent Decision Transformer (MADT) with graph attention for spatial coordination. On the LightSim simulator, DT reduces average EV travel time by 37.7% relative to fixed-timing preemption on a 4x4 grid (88.6 s vs. 142.3 s), achieving the lowest civilian delay (11.3 s/veh) and fewest EV stops (1.2) among all methods, including online RL baselines that require environment interaction. MADT further improves on larger grids, overtaking DT with 45.2% reduction on 8x8 via graph-attention coordination. Return conditioning produces a smooth dispatch interface: varying the target return from 100 to -400 trades EV travel time (72.4-138.2 s) against civilian delay (16.8-5.4 s/veh), requiring no retraining. A Constrained DT extension adds explicit civilian disruption budgets as a second control knob.

Emergency Preemption Without Online Exploration: A Decision Transformer Approach | AI Navigate