Synthesis and Deployment of Maximal Robust Control Barrier Functions through Adversarial Reinforcement Learning

arXiv cs.RO / 4/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a new robust control barrier function (CBF) framework that targets maximal robust safe sets for general nonlinear systems with bounded uncertainty, addressing limitations of prior methods that only certify conservative subsets.
  • It shows that a safety value function solving the dynamic programming Isaacs equation can be used as a robust discrete-time CBF to enforce safety on the maximal robust safe set.
  • The authors introduce a reinforcement-learning-inspired “robust Q-CBF” that lifts the barrier certificate into state-action space, enabling safety filtering without requiring explicit closed-form system dynamics.
  • By combining this robust Q-CBF formulation with adversarial reinforcement learning, the method supports synthesis and deployment on black-box dynamics with unknown uncertainty structure.
  • Experiments on an inverted pendulum benchmark and a 36-D quadruped simulator demonstrate substantially less conservative safe sets on the pendulum and reliable safety enforcement under adversarial uncertainty on the quadruped.

Abstract

Robust control barrier functions (CBFs) provide a principled mechanism for smooth safety enforcement under worst-case disturbances. However, existing approaches typically rely on explicit, closed-form structure in the dynamics (e.g., control-affine) and uncertainty models. This has led to limited scalability and generality, with most robust CBFs certifying only conservative subsets of the maximal robust safe set. In this paper, we introduce a new robust CBF framework for general nonlinear systems under bounded uncertainty. We first show that the safety value function solving the dynamic programming Isaacs equation is a valid robust discrete-time CBF that enforces safety on the maximal robust safe set. We then adopt the key reinforcement learning (RL) notion of quality function (or Q-function), which removes the need for explicit dynamics by lifting the barrier certificate into state-action space and yields a novel robust Q-CBF constraint for safety filtering. Combined with adversarial RL, this enables the synthesis and deployment of robust Q-CBFs on general nonlinear systems with black-box dynamics and unknown uncertainty structure. We validate the framework on a canonical inverted pendulum benchmark and a 36-D quadruped simulator, achieving substantially less conservative safe sets than barrier-based baselines on the pendulum and reliable safety enforcement even under adversarial uncertainty realizations on the quadruped.