Altered Thoughts, Altered Actions: Probing Chain-of-Thought Vulnerabilities in VLA Robotic Manipulation

arXiv cs.AI / 3/16/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The study investigates vulnerabilities in Vision-Language-Action models that use chain-of-thought reasoning by corrupting only the internal reasoning trace while keeping inputs intact, to assess impact on robotic manipulation performance.
Researchers introduce a taxonomy of seven text corruptions across three attacker tiers (blind noise, mechanical-semantic, and LLM-adaptive) and evaluate them on 40 LIBERO tabletop tasks.
Substituting object names in the reasoning trace significantly reduces success rates (about 8.3 percentage points overall, up to 19.3 pp on goal-conditioned tasks and 45 pp on some individual tasks), while other corruptions have negligible impact.
The results imply the action decoder relies more on entity-reference grounding than on the quality or sequential structure of the reasoning trace.
A sophisticated LLM-based attacker can be less effective than simple object-name substitutions, and the vulnerability is specific to reasoning-augmented models, with instruction-level attacks affecting both reasoning and non-reasoning architectures.

Abstract

Recent Vision-Language-Action (VLA) models increasingly adopt chain-of-thought (CoT) reasoning, generating a natural-language plan before decoding motor commands. This internal text channel between the reasoning module and the action decoder has received no adversarial scrutiny. We ask: which properties of this intermediate plan does the action decoder actually rely on, and can targeted corruption of the reasoning trace alone -- with all inputs left intact -- degrade a robot's physical task performance? We design a taxonomy of seven text corruptions organized into three attacker tiers (blind noise, mechanical-semantic, and LLM-adaptive) and apply them to a state-of-the-art reasoning VLA across 40 LIBERO tabletop manipulation tasks. Our results reveal a striking asymmetry: substituting object names in the reasoning trace reduces overall success rate by 8.3~percentage points (pp) -- reaching

-

19.3~pp on goal-conditioned tasks and

-

45~pp on individual tasks -- whereas sentence reordering, spatial-direction reversal, token noise, and even a 70B-parameter LLM crafting plausible-but-wrong plans all have negligible impact (within

\pm

4~pp). This asymmetry indicates that the action decoder depends on entity-reference integrity rather than reasoning quality or sequential structure. Notably, a sophisticated LLM-based attacker underperforms simple mechanical object-name substitution, because preserving plausibility inadvertently retains the entity-grounding structure the decoder needs. A cross-architecture control using a non-reasoning VLA confirms the vulnerability is exclusive to reasoning-augmented models, while instruction-level attacks degrade both architectures -- establishing that the internal reasoning trace is a distinct and stealthy threat vector invisible to input-validation defenses.

The programming passion is melting

Dev.to

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations

Dev.to

Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders

Reddit r/LocalLLaMA

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)

Dev.to

I think I made the best general use System Prompt for Qwen 3.5 (OpenWebUI + Web search)

Reddit r/LocalLLaMA

Altered Thoughts, Altered Actions: Probing Chain-of-Thought Vulnerabilities in VLA Robotic Manipulation

Key Points

Abstract

Related Articles

The programming passion is melting

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations

Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)

I think I made the best general use System Prompt for Qwen 3.5 (OpenWebUI + Web search)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer