RL-STPA: Adapting System-Theoretic Hazard Analysis for Safety-Critical Reinforcement Learning

arXiv cs.LG / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that current RL evaluation methods struggle in safety-critical settings due to neural-network “black-box” behavior and distribution shift between training and real deployment.
It proposes RL-STPA, an adaptation of STPA (System-Theoretic Process Analysis) for reinforcement learning, combining hierarchical subtask decomposition, temporal phase/domain expertise, and emergent-behavior awareness.
RL-STPA adds coverage-guided perturbation testing to probe sensitivity across state-action spaces, helping uncover hazardous loss scenarios that standard evaluations may miss.
The framework uses iterative checkpoints to feed detected hazards back into training via reward shaping and curriculum design, aiming to improve safety and robustness over time.
Experiments on autonomous drone navigation and landing show RL-STPA can identify safety-relevant failure modes while acknowledging that it does not offer formal guarantees for arbitrary neural policies.

Abstract

As reinforcement learning (RL) deployments expand into safety-critical domains, existing evaluation methods fail to systematically identify hazards arising from the black-box nature of neural network enabled policies and distributional shift between training and deployment. This paper introduces Reinforcement Learning System-Theoretic Process Analysis (RL-STPA), a framework that adapts conventional STPA's systematic hazard analysis to address RL's unique challenges through three key contributions: hierarchical subtask decomposition using both temporal phase analysis and domain expertise to capture emergent behaviors, coverage-guided perturbation testing that explores the sensitivity of state-action spaces, and iterative checkpoints that feed identified hazards back into training through reward shaping and curriculum design. We demonstrate RL-STPA in the safety-critical test case of autonomous drone navigation and landing, revealing potential loss scenarios that can be missed by standard RL evaluations. The proposed framework provides practitioners with a toolkit for systematic hazard analysis, quantitative metrics for safety coverage assessment, and actionable guidelines for establishing operational safety bounds. While RL-STPA cannot provide formal guarantees for arbitrary neural policies, it offers a practical methodology for systematically evaluating and improving RL safety and robustness in safety-critical applications where exhaustive verification methods remain intractable.