Deep QP Safety Filter: Model-free Learning for Reachability-based Safety Filter

arXiv cs.RO / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • Deep QP Safety Filter proposes a fully data-driven safety layer for black-box dynamical systems that learns a reachability-based Quadratic-Program (QP) safety filter without requiring the system model.
  • The approach combines Hamilton–Jacobi (HJ) reachability with model-free learning by designing contraction-based losses for both the safety value function and its derivatives, trained via two neural networks.
  • The authors claim that, in the exact setting, the learned critic converges to the viscosity solution (and derivative) even when the underlying safety value is non-smooth.
  • Experiments across multiple dynamical systems, including hybrid systems, and several RL tasks show reduced pre-convergence failures and faster learning toward higher returns versus strong baselines.
  • Overall, the work frames a principled and practical route to safer model-free control for reinforcement learning in settings where dynamics are unknown or only observable from data.

Abstract

We introduce Deep QP Safety Filter, a fully data-driven safety layer for black-box dynamical systems. Our method learns a Quadratic-Program (QP) safety filter without model knowledge by combining Hamilton-Jacobi (HJ) reachability with model-free learning. We construct contraction-based losses for both the safety value and its derivatives, and train two neural networks accordingly. In the exact setting, the learned critic converges to the viscosity solution (and its derivative), even for non-smooth values. Across diverse dynamical systems -- even including a hybrid system -- and multiple RL tasks, Deep QP Safety Filter substantially reduces pre-convergence failures while accelerating learning toward higher returns than strong baselines, offering a principled and practical route to safe, model-free control.

Deep QP Safety Filter: Model-free Learning for Reachability-based Safety Filter | AI Navigate