RL-STPA: Adapting System-Theoretic Hazard Analysis for Safety-Critical Reinforcement Learning
arXiv cs.LG / 4/17/2026
📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that current RL evaluation methods struggle in safety-critical settings due to neural-network “black-box” behavior and distribution shift between training and real deployment.
- It proposes RL-STPA, an adaptation of STPA (System-Theoretic Process Analysis) for reinforcement learning, combining hierarchical subtask decomposition, temporal phase/domain expertise, and emergent-behavior awareness.
- RL-STPA adds coverage-guided perturbation testing to probe sensitivity across state-action spaces, helping uncover hazardous loss scenarios that standard evaluations may miss.
- The framework uses iterative checkpoints to feed detected hazards back into training via reward shaping and curriculum design, aiming to improve safety and robustness over time.
- Experiments on autonomous drone navigation and landing show RL-STPA can identify safety-relevant failure modes while acknowledging that it does not offer formal guarantees for arbitrary neural policies.

![[Patterns] AI Agent Error Handling That Actually Works](/_next/image?url=https%3A%2F%2Fmedia2.dev.to%2Fdynamic%2Fimage%2Fwidth%3D1200%2Cheight%3D627%2Cfit%3Dcover%2Cgravity%3Dauto%2Cformat%3Dauto%2Fhttps%253A%252F%252Fdev-to-uploads.s3.amazonaws.com%252Fuploads%252Farticles%252Frn5czaopq2vzo7cglady.png&w=3840&q=75)

