Uncovering Linguistic Fragility in Vision-Language-Action Models via Diversity-Aware Red Teaming

arXiv cs.RO / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses a safety gap for Vision-Language-Action (VLA) robotic models by focusing on how linguistic nuances can trigger unexpected or catastrophic behaviors in embodied agents.
It argues that standard RL-based red teaming can suffer from mode collapse, limiting adversaries to repetitive failure patterns and thereby missing a broader set of meaningful vulnerabilities.
The proposed DAERT (Diversity-Aware Embodied Red Teaming) framework uses a diversity-aware uniform policy to generate a wide variety of challenging linguistic instructions while maintaining attack effectiveness measured via execution failures in a physical simulator.
Experiments on multiple robotic benchmarks against two state-of-the-art VLAs (π0 and OpenVLA) show the method finds substantially more effective adversarial instructions, dropping average task success rate from 93.33% to 5.85%.
Overall, DAERT is presented as a scalable approach for stress-testing VLA agents to uncover safety blind spots prior to real-world deployment.

Abstract

Vision-Language-Action (VLA) models have achieved remarkable success in robotic manipulation. However, their robustness to linguistic nuances remains a critical, under-explored safety concern, posing a significant safety risk to real-world deployment. Red teaming, or identifying environmental scenarios that elicit catastrophic behaviors, is an important step in ensuring the safe deployment of embodied AI agents. Reinforcement learning (RL) has emerged as a promising approach in automated red teaming that aims to uncover these vulnerabilities. However, standard RL-based adversaries often suffer from severe mode collapse due to their reward-maximizing nature, which tends to converge to a narrow set of trivial or repetitive failure patterns, failing to reveal the comprehensive landscape of meaningful risks. To bridge this gap, we propose a novel \textbf{D}iversity-\textbf{A}ware \textbf{E}mbodied \textbf{R}ed \textbf{T}eaming (\textbf{DAERT}) framework, to expose the vulnerabilities of VLAs against linguistic variations. Our design is based on evaluating a uniform policy, which is able to generate a diverse set of challenging instructions while ensuring its attack effectiveness, measured by execution failures in a physical simulator. We conduct extensive experiments across different robotic benchmarks against two state-of-the-art VLAs, including

\pi_0

and OpenVLA. Our method consistently discovers a wider range of more effective adversarial instructions that reduce the average task success rate from 93.33\% to 5.85\%, demonstrating a scalable approach to stress-testing VLA agents and exposing critical safety blind spots before real-world deployment.

Black Hat Asia

AI Business

Meta's latest model is as open as Zuckerberg's private school

The Register

AI fuels global trade growth as China-US flows shift, McKinsey finds

SCMP Tech

Why multi-agent AI security is broken (and the identity patterns that actually work)

Dev.to

BANKING77-77: New best of 94.61% on the official test set (+0.13pp) over our previous tests 94.48%.

Reddit r/artificial

Uncovering Linguistic Fragility in Vision-Language-Action Models via Diversity-Aware Red Teaming

Key Points

Abstract

Related Articles

Black Hat Asia

Meta's latest model is as open as Zuckerberg's private school

AI fuels global trade growth as China-US flows shift, McKinsey finds

Why multi-agent AI security is broken (and the identity patterns that actually work)

BANKING77-77: New best of 94.61% on the official test set (+0.13pp) over our previous tests 94.48%.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer