Deep QP Safety Filter: Model-free Learning for Reachability-based Safety Filter

arXiv cs.RO / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

Deep QP Safety Filter proposes a fully data-driven safety layer for black-box dynamical systems that learns a reachability-based Quadratic-Program (QP) safety filter without requiring the system model.
The approach combines Hamilton–Jacobi (HJ) reachability with model-free learning by designing contraction-based losses for both the safety value function and its derivatives, trained via two neural networks.
The authors claim that, in the exact setting, the learned critic converges to the viscosity solution (and derivative) even when the underlying safety value is non-smooth.
Experiments across multiple dynamical systems, including hybrid systems, and several RL tasks show reduced pre-convergence failures and faster learning toward higher returns versus strong baselines.
Overall, the work frames a principled and practical route to safer model-free control for reinforcement learning in settings where dynamics are unknown or only observable from data.

Abstract

We introduce Deep QP Safety Filter, a fully data-driven safety layer for black-box dynamical systems. Our method learns a Quadratic-Program (QP) safety filter without model knowledge by combining Hamilton-Jacobi (HJ) reachability with model-free learning. We construct contraction-based losses for both the safety value and its derivatives, and train two neural networks accordingly. In the exact setting, the learned critic converges to the viscosity solution (and its derivative), even for non-smooth values. Across diverse dynamical systems -- even including a hybrid system -- and multiple RL tasks, Deep QP Safety Filter substantially reduces pre-convergence failures while accelerating learning toward higher returns than strong baselines, offering a principled and practical route to safe, model-free control.

Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]

Reddit r/MachineLearning

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Failure to Reproduce Modern Paper Claims [D]

Reddit r/MachineLearning

Why don’t they just use Mythos to fix all the bugs in Claude Code?

Reddit r/LocalLLaMA

Deep QP Safety Filter: Model-free Learning for Reachability-based Safety Filter

Key Points

Abstract

Related Articles

Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Failure to Reproduce Modern Paper Claims [D]

Why don’t they just use Mythos to fix all the bugs in Claude Code?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer