Semi-Markov Reinforcement Learning for City-Scale EV Ride-Hailing with Feasibility-Guaranteed Actions
arXiv cs.AI / 4/29/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper presents a city-scale EV ride-hailing fleet control framework that jointly optimizes dispatch, repositioning, and charging while respecting charger and feeder limits under uncertain, spatially correlated demand and travel times.
- It formulates the problem as a hex-grid semi-Markov decision process (semi-MDP) with mixed actions (discrete service/reposition/charge decisions plus continuous charging power) and variable action durations.
- To ensure physical feasibility in both training and deployment, the policy uses masked, temperature-annealed high-level intentions that are enforced at each step via a time-limited rolling mixed-integer linear program (MILP) with strict state-of-charge, port, and feeder constraints.
- For robustness to distribution shifts, the method trains a Soft Actor–Critic (SAC) agent using a Wasserstein-1 ambiguity set with a graph-aligned Mahalanobis metric, and applies a robust backup based on Kantorovich–Rubinstein duality with primal–dual risk-budget updates.
- Experiments in a NYC taxi-data–driven EV simulator show PD–RSAC delivers the highest net profit (about $1.22M) versus $0.58M–$0.70M for multiple heuristic and RL baselines, while achieving zero feeder-limit violations.


