Reinforcement Learning for Testing Interdependent Requirements in Autonomous Vehicles: An Empirical Study

arXiv cs.RO / 4/29/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The study investigates whether single-objective reinforcement learning (SORL) or multi-objective reinforcement learning (MORL) better generates scenario-based tests for autonomous vehicles when requirements are interdependent and trade off against each other.
  • Using an end-to-end AV controller and a high-fidelity simulator, the authors empirically compare SORL vs MORL for producing critical scenarios that target both functional and safety requirement violations.
  • Results indicate that SORL and MORL are often comparably effective overall, but they differ in violation characteristics: MORL more frequently generates requirement-violation scenarios, while SORL tends to produce higher-severity violations.
  • The relative performance varies with the specific combinations of objectives and, to a lesser extent, with road conditions, and MORL generally provides greater scenario diversity and coverage.
  • The paper fills an evaluation gap by systematically comparing SORL and MORL, emphasizing that accounting for requirement dependencies is important for RL-based AV testing strategy selection.

Abstract

Autonomous vehicles (AVs) make driving decisions without humans, making dependability assurance critical. Scenario-based testing is widely used to evaluate AVs under diverse conditions, with reinforcement learning (RL) generating test scenarios that identify violations of functional and safety requirements. Many requirements are interdependent and involve trade-offs, making it unclear whether single-objective RL (SORL), which combines objectives into a single reward, can reliably reveal violations or whether multi-objective RL (MORL), which explicitly considers multiple objectives, is necessary. We present an empirical evaluation comparing SORL and MORL for generating critical scenarios that simultaneously test interdependent requirements using an end-to-end AV controller and high-fidelity simulator. Results suggest that MORL and SORL differ mainly in how violations occur, while showing comparable effectiveness in many cases. MORL tends to generate more requirement-violation scenarios, whereas SORL produces higher-severity violations. Their relative performance also depends on specific objective combinations and, to a lesser extent, road conditions. Regarding diversity, MORL consistently covers a broader range of scenarios. Thus, MORL is preferable when scenario diversity and coverage are prioritized, whereas SORL may better expose severe violations. Our empirical evaluation addresses a gap by systematically comparing SORL and MORL, highlighting the importance of requirement dependencies in RL-based AV testing.