V-OCBF: Learning Safety Filters from Offline Data via Value-Guided Offline Control Barrier Functions

arXiv cs.RO / 4/3/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes V-OCBF (Value-Guided Offline Control Barrier Functions), a framework for learning safety filters from offline demonstrations to achieve strict state-wise safety without online interaction.
Unlike prior Safe Offline RL methods that focus on soft expected-cost constraints, V-OCBF learns a neural control barrier function designed to enforce forward invariance.
The method is model-free: it does not require access to the system dynamics model and instead uses a recursive finite-difference barrier update for learning the barrier over time.
V-OCBF uses an expectile-based objective to reduce sensitivity to out-of-distribution actions and restricts updates to actions supported by the offline dataset.
The learned barrier is integrated into a real-time controller via a Quadratic Program (QP), and the authors report fewer safety violations than baselines while retaining strong task performance across multiple case studies.

Abstract

Ensuring safety in autonomous systems requires controllers that aim to satisfy state-wise constraints without relying on online interaction.While existing Safe Offline RL methods typically enforce soft expected-cost constraints, they struggle to ensure strict state-wise safety. Conversely, Control Barrier Functions (CBFs) offer a principled mechanism to enforce forward invariance, but often rely on expert-designed barrier functions or knowledge of the system dynamics. We introduce Value-Guided Offline Control Barrier Functions (V-OCBF), a framework that learns a neural CBF entirely from offline demonstrations. Unlike prior approaches, V-OCBF does not assume access to the dynamics model; instead, it derives a recursive finite-difference barrier update, enabling model-free learning of a barrier that propagates safety information over time. Moreover, V-OCBF incorporates an expectile-based objective that avoids querying the barrier on out-of-distribution actions and restricts updates to the dataset-supported action set. The learned barrier is then used with a Quadratic Program (QP) formulation to synthesize real-time safe control. Across multiple case studies, V-OCBF yields substantially fewer safety violations than baseline methods while maintaining strong task performance, highlighting its scalability for offline synthesis of safety-critical controllers without online interaction or hand-engineered barriers.

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Dev.to

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

Dev.to

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

MarkTechPost

The house asked me a question

Dev.to

Precision Clip Selection: How AI Suggests Your In and Out Points

Dev.to

V-OCBF: Learning Safety Filters from Offline Data via Value-Guided Offline Control Barrier Functions

Key Points

Abstract

Related Articles

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

The house asked me a question

Precision Clip Selection: How AI Suggests Your In and Out Points

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer