Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms

arXiv cs.RO / 4/28/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Vision-Language-Action（VLA）モデルは身体性を備えた統合基盤として注目されており、物理的に取り返しのつかない結果を生む可能性など、新たな安全課題が発生します。
VLAでは、視覚・言語・状態（ロボット状態）の複数モーダルにまたがる攻撃面、長期行動でのエラー伝播、リアルタイムの防御レイテンシ制約、データ供給チェーンの脆弱性が重要な論点です。
文献がロボティクス学習、敵対的機械学習、AIアライメント、自律システム安全などに分断されているため、本調査は攻撃/防御/評価/デプロイメントの観点で体系化し、学界の全体像を統合的に整理しています。
攻撃と防御を「いつ起きるか」（学習時か推論時か、さらに防御も同様に）という時間軸で整理し、データポイズニングやバックドア、敵対的パッチやクロスモーダル擾乱、セマンティック・ジャイルブレイク、フリーズ攻撃などを対象にしています。
今後の主要な未解決課題として、身体軌道に対する認証付きロバスト性、現実に実装可能な防御、安全を考慮した学習、統一的なランタイム安全アーキテクチャ、標準化された評価が挙げられます。

Abstract

Vision-Language-Action (VLA) models are emerging as a unified substrate for embodied intelligence. This shift raises a new class of safety challenges, stemming from the embodied nature of VLA systems, including irreversible physical consequences, a multimodal attack surface across vision, language, and state, real-time latency constraints on defense, error propagation over long-horizon trajectories, and vulnerabilities in the data supply chain. Yet the literature remains fragmented across robotic learning, adversarial machine learning, AI alignment, and autonomous systems safety. This survey provides a unified and up-to-date overview of safety in Vision-Language-Action models. We organize the field along two parallel timing axes, attack timing (training-time vs. inference-time and defense timing (training-time vs. inference-time, linking each class of threat to the stage at which it can be mitigated. We first define the scope of VLA safety, distinguishing it from text-only LLM safety and classical robotic safety, and review the foundations of VLA models, including architectures, training paradigms, and inference mechanisms. We then examine the literature through four lenses: Attacks, Defenses, Evaluation, and Deployment. We survey training-time threats such as data poisoning and backdoors, as well as inference-time attacks including adversarial patches, cross-modal perturbations, semantic jailbreaks, and freezing attacks. We review training-time and runtime defenses, analyze existing benchmarks and metrics, and discuss safety challenges across six deployment domains. Finally, we highlight key open problems, including certified robustness for embodied trajectories, physically realizable defenses, safety-aware training, unified runtime safety architectures, and standardized evaluation.