RedVLA: Physical Red Teaming for Vision-Language-Action Models

arXiv cs.RO / 4/27/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces RedVLA, the first dedicated red-teaming framework aimed at detecting physical safety risks before deploying Vision-Language-Action (VLA) models in the real world.
RedVLA uses a two-stage pipeline: Risk Scenario Synthesis to create task-feasible initial risk scenes that entangle the risk factor with the model’s execution, and Risk Amplification to reliably elicit unsafe behaviors across different VLA models.
Risk Amplification is performed using gradient-free optimization iteratively refined by trajectory features, improving stability when testing heterogeneous models.
Experiments across six representative VLA models show RedVLA can discover diverse unsafe behaviors and reach an attack success rate (ASR) of up to 95.5% within 10 optimization iterations.
The authors also propose SimpleVLA-Guard, a lightweight safety guard trained using data generated by RedVLA, and release the data, assets, and code publicly.

Abstract

The real-world deployment of Vision-Language-Action (VLA) models remains limited by the risk of unpredictable and irreversible physical harm. However, we currently lack effective mechanisms to proactively detect these physical safety risks before deployment. To address this gap, we propose \textbf{RedVLA}, the first red teaming framework for physical safety in VLA models. We systematically uncover unsafe behaviors through a two-stage process: (I) \textbf{Risk Scenario Synthesis} constructs a valid and task-feasible initial risk scene. Specifically, it identifies critical interaction regions from benign trajectories and positions the risk factor within these regions, aiming to entangle it with the VLA's execution flow and elicit a target unsafe behavior. (II) \textbf{Risk Amplification} ensures stable elicitation across heterogeneous models. It iteratively refines the risk factor state through gradient-free optimization guided by trajectory features. Experiments on six representative VLA models show that RedVLA uncovers diverse unsafe behaviors and achieves the ASR up to 95.5\% within 10 optimization iterations. To mitigate these risks, we further propose SimpleVLA-Guard, a lightweight safety guard built from RedVLA-generated data. Our data, assets, and code are available \href{https://redvla.github.io}{here}.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/27DailyView insight →

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Everyone Wants AI Agents. Fewer Teams Are Ready for the Messy Business Context Behind Them

Dev.to

Free Registration & $20K Prize Pool: 2nd MLC-SLM Challenge 2026 on Multilingual Speech LLMs [N]

Reddit r/MachineLearning

AI 编程工具对比 2026：Claude Code vs Cursor vs Gemini CLI vs Codex

Dev.to

How I Improved My YouTube Shorts and Podcast Audio Workflow with AI Tools

Dev.to

RedVLA: Physical Red Teaming for Vision-Language-Action Models

Key Points

Abstract

💡 Insights using this article

Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Everyone Wants AI Agents. Fewer Teams Are Ready for the Messy Business Context Behind Them

Free Registration & $20K Prize Pool: 2nd MLC-SLM Challenge 2026 on Multilingual Speech LLMs [N]

AI 编程工具对比 2026：Claude Code vs Cursor vs Gemini CLI vs Codex

How I Improved My YouTube Shorts and Podcast Audio Workflow with AI Tools

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer