SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

arXiv cs.RO / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • SafeVLA(SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning)は、視覚・言語・行動を統合するVLA(Vision-Language-Action)モデルの安全性を高めるために、安全要件を明示的に統合する枠組みを提案しています。
  • 具体的には、統合安全アプローチ(ISA)により安全要件をモデル化し、多様な危険行動を積極的に引き出して、そのリスクを制約付き学習(安全強化学習/CMDP)でVLAポリシーに織り込みます。
  • min-max(最小-最大)視点で引き出した安全リスクに対してVLAを最適化し、狙いどおり安全性能とタスク成功率のトレードオフを両立することを目指しています。
  • 実験では、長時間のモバイルマニピュレーション課題で、安全違反の累積コストを既存の最先端手法より83.58%削減しつつ、タスク成功率も+3.85%維持したと報告されています。
  • さらに、長尾リスクへの軽減、極端な失敗シナリオへの対応、学習した安全行動のOOD(分布外)摂動への頑健な一般化能力が示されたとされています。

Abstract

Vision-language-action models (VLAs) show potential as generalist robot policies. However, these models pose extreme safety challenges during real-world deployment, including the risk of harm to the environment, the robot itself, and humans. How can safety constraints be explicitly integrated into VLAs? We address this by exploring an integrated safety approach (ISA), systematically modeling safety requirements, then actively eliciting diverse unsafe behaviors, effectively constraining VLA policies via safe reinforcement learning, and rigorously assuring their safety through targeted evaluations. Leveraging the constrained Markov decision process (CMDP) paradigm, ISA optimizes VLAs from a min-max perspective against elicited safety risks. Thus, policies aligned through this comprehensive approach achieve the following key features: (I) effective safety-performance trade-offs, reducing the cumulative cost of safety violations by 83.58% compared to the state-of-the-art method, while also maintaining task success rate (+3.85%). (II) strong safety assurance, with the ability to mitigate long-tail risks and handle extreme failure scenarios. (III) robust generalization of learned safety behaviors to various out-of-distribution perturbations. The effectiveness is evaluated on long-horizon mobile manipulation tasks. Our data, models and newly proposed benchmark environment are available at https://pku-safevla.github.io.