Deep QP Safety Filter: Model-free Learning for Reachability-based Safety Filter
arXiv cs.RO / 4/15/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- Deep QP Safety Filter proposes a fully data-driven safety layer for black-box dynamical systems that learns a reachability-based Quadratic-Program (QP) safety filter without requiring the system model.
- The approach combines Hamilton–Jacobi (HJ) reachability with model-free learning by designing contraction-based losses for both the safety value function and its derivatives, trained via two neural networks.
- The authors claim that, in the exact setting, the learned critic converges to the viscosity solution (and derivative) even when the underlying safety value is non-smooth.
- Experiments across multiple dynamical systems, including hybrid systems, and several RL tasks show reduced pre-convergence failures and faster learning toward higher returns versus strong baselines.
- Overall, the work frames a principled and practical route to safer model-free control for reinforcement learning in settings where dynamics are unknown or only observable from data.
Related Articles
Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]
Reddit r/MachineLearning

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
Failure to Reproduce Modern Paper Claims [D]
Reddit r/MachineLearning
Why don’t they just use Mythos to fix all the bugs in Claude Code?
Reddit r/LocalLLaMA