RL-STPA: Adapting System-Theoretic Hazard Analysis for Safety-Critical Reinforcement Learning

arXiv cs.LG / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that current RL evaluation methods struggle in safety-critical settings due to neural-network “black-box” behavior and distribution shift between training and real deployment.
  • It proposes RL-STPA, an adaptation of STPA (System-Theoretic Process Analysis) for reinforcement learning, combining hierarchical subtask decomposition, temporal phase/domain expertise, and emergent-behavior awareness.
  • RL-STPA adds coverage-guided perturbation testing to probe sensitivity across state-action spaces, helping uncover hazardous loss scenarios that standard evaluations may miss.
  • The framework uses iterative checkpoints to feed detected hazards back into training via reward shaping and curriculum design, aiming to improve safety and robustness over time.
  • Experiments on autonomous drone navigation and landing show RL-STPA can identify safety-relevant failure modes while acknowledging that it does not offer formal guarantees for arbitrary neural policies.

Abstract

As reinforcement learning (RL) deployments expand into safety-critical domains, existing evaluation methods fail to systematically identify hazards arising from the black-box nature of neural network enabled policies and distributional shift between training and deployment. This paper introduces Reinforcement Learning System-Theoretic Process Analysis (RL-STPA), a framework that adapts conventional STPA's systematic hazard analysis to address RL's unique challenges through three key contributions: hierarchical subtask decomposition using both temporal phase analysis and domain expertise to capture emergent behaviors, coverage-guided perturbation testing that explores the sensitivity of state-action spaces, and iterative checkpoints that feed identified hazards back into training through reward shaping and curriculum design. We demonstrate RL-STPA in the safety-critical test case of autonomous drone navigation and landing, revealing potential loss scenarios that can be missed by standard RL evaluations. The proposed framework provides practitioners with a toolkit for systematic hazard analysis, quantitative metrics for safety coverage assessment, and actionable guidelines for establishing operational safety bounds. While RL-STPA cannot provide formal guarantees for arbitrary neural policies, it offers a practical methodology for systematically evaluating and improving RL safety and robustness in safety-critical applications where exhaustive verification methods remain intractable.