Autonomous Vehicle Collision Avoidance With Racing Parameterized Deep Reinforcement Learning

arXiv cs.RO / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes an out-of-distribution (OOD) collision-avoidance policy for autonomous vehicles using parameterized deep reinforcement learning (DRL) that aims to stay within nonlinear vehicle dynamics while remaining computationally efficient.
It trains policies in simulation using a “race car overtaking” setup, leveraging a physics-informed, simulator-exploit-aware reward rather than explicit geometric trajectory guidance.
Two DRL variants are evaluated—a default overtaking policy and a reversed-heading variant—and both are reported to outperform common MPC and artificial potential field (MPC-APF) baselines across multiple intersection collision scenarios.
The approach is claimed to transfer “zero-shot” to proportionally scaled hardware, while using substantially fewer compute resources (31× fewer FLOPS) and lower inference latency (64× lower).
In head-to-head collision tests, the reversed-heading policy improves performance by 30% over the default DRL racing policy and by 50% over the MPC-APF baseline, with both DRL methods achieving about 10% better-than-numerical-optimal-control evasion in side collisions.

Abstract

Road traffic accidents are a leading cause of fatalities worldwide. In the US, human error causes 94% of crashes, resulting in excess of 7,000 pedestrian fatalities and $500 billion in costs annually. Autonomous Vehicles (AVs) with emergency collision avoidance systems that operate at the limits of vehicle dynamics at a high frequency, a dual constraint of nonlinear kinodynamic accuracy and computational efficiency, further enhance safety benefits during adverse weather and cybersecurity breaches, and to evade dangerous human driving when AVs and human drivers share roads. This paper parameterizes a Deep Reinforcement Learning (DRL) collision avoidance policy Out-Of-Distribution (OOD) utilizing race car overtaking, without explicit geometric mimicry reference trajectory guidance, in simulation, with a physics-informed, simulator exploit-aware reward to encode nonlinear vehicle kinodynamics. Two policies are evaluated, a default uni-direction and a reversed heading variant that navigates in the opposite direction to other cars, which both consistently outperform a Model Predictive Control and Artificial Potential Function (MPC-APF) baseline, with zero-shot transfer to proportionally scaled hardware, across three intersection collision scenarios, at 31x fewer Floating Point Operations (FLOPS) and 64x lower inference latency. The reversed heading policy outperforms the default racing overtaking policy in head-to-head collisions by 30% and the baseline by 50%, and matches the former in side collisions, where both DRL policies evade 10% greater than numerical optimal control.