Emergency Preemption Without Online Exploration: A Decision Transformer Approach

arXiv cs.AI / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes a Decision Transformer (DT) and return-conditioned sequence modeling approach for emergency vehicle corridor optimization that avoids any online environment interaction during training.
It introduces a single target-return scalar to provide dispatch-level urgency control, allowing smooth tradeoffs between emergency vehicle travel time and civilian delay without retraining.
In LightSim experiments on a 4x4 grid, the DT approach reduces average emergency vehicle travel time by 37.7% versus fixed-timing preemption and achieves the lowest civilian delay and fewest EV stops among compared methods.
The extension to multi-agent settings (Multi-Agent Decision Transformer with graph attention) further improves performance on larger 8x8 grids, delivering a 45.2% travel-time reduction.
A Constrained DT variant adds an explicit civilian disruption budget as a second control parameter to make the time-delay tradeoff more controllable.
Point 5

Abstract

Emergency vehicle (EV) response time is a critical determinant of survival outcomes, yet deployed signal preemption strategies remain reactive and uncontrollable. We propose a return-conditioned framework for emergency corridor optimization based on the Decision Transformer (DT). By casting corridor optimization as offline, return-conditioned sequence modeling, our approach (1) eliminates online environment interaction during policy learning, (2) enables dispatch-level urgency control through a single target-return scalar, and (3) extends to multi-agent settings via a Multi-Agent Decision Transformer (MADT) with graph attention for spatial coordination. On the LightSim simulator, DT reduces average EV travel time by 37.7% relative to fixed-timing preemption on a 4x4 grid (88.6 s vs. 142.3 s), achieving the lowest civilian delay (11.3 s/veh) and fewest EV stops (1.2) among all methods, including online RL baselines that require environment interaction. MADT further improves on larger grids, overtaking DT with 45.2% reduction on 8x8 via graph-attention coordination. Return conditioning produces a smooth dispatch interface: varying the target return from 100 to -400 trades EV travel time (72.4-138.2 s) against civilian delay (16.8-5.4 s/veh), requiring no retraining. A Constrained DT extension adds explicit civilian disruption budgets as a second control knob.

The Complete Guide to Model Context Protocol (MCP): Building AI-Native Applications in 2026

Dev.to

AI Agent Skill Security Report — 2026-03-25

Dev.to

Origin raises $30M Series A+ to improve global benefits efficiency

Tech.eu

AI Shields Your Money: Banks’ New Fraud Fighters

Dev.to

Building AI Phone Systems for Veterinary Clinics — What Actually Works

Dev.to

Emergency Preemption Without Online Exploration: A Decision Transformer Approach

Key Points

Abstract

Related Articles

The Complete Guide to Model Context Protocol (MCP): Building AI-Native Applications in 2026

AI Agent Skill Security Report — 2026-03-25

Origin raises $30M Series A+ to improve global benefits efficiency

AI Shields Your Money: Banks’ New Fraud Fighters

Building AI Phone Systems for Veterinary Clinics — What Actually Works

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer