Beyond Scalar Rewards: Distributional Reinforcement Learning with Preordered Objectives for Safe and Reliable Autonomous Driving
arXiv cs.RO / 2026/3/24
💬 オピニオンIdeas & Deep AnalysisModels & Research
要点
- The paper argues that scalarizing multiple driving objectives in RL (e.g., safety vs. efficiency) can collapse priority information and lead to policies that violate safety-critical constraints.
- It introduces the Preordered Multi-Objective MDP (Pr-MOMDP), which represents objectives with an explicit precedence (preorder) structure rather than combining them into a single weighted reward.
- To operationalize this, the authors extend distributional RL using Quantile Dominance (QD), a pairwise comparison metric that evaluates action return distributions without compressing them into one statistic.
- They propose an algorithm for extracting non-dominated action subsets across objectives, so precedence directly shapes both decision-making and training targets.
- Experiments on CARLA using Implicit Quantile Networks (IQN) show improved success rates and fewer collisions/off-road events, along with statistically more robust policies than IQN and ensemble-IQN baselines.

