If you're waiting for a sign... that might not be it! Mitigating Trust Boundary Confusion from Visual Injections on Vision-Language Agentic Systems

arXiv cs.CV / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

共有:

Key Points

The paper studies “trust boundary confusion” in embodied vision-language agentic systems, where legitimate in-scene signals (e.g., traffic lights) can be exploited through misleading visual injections.
It introduces a dual-intent dataset and evaluation framework, showing that current LVLM-based agents often either ignore useful cues or incorrectly follow harmful ones.
The authors benchmark 7 LVLM agents across multiple embodied environments, testing both structure-based and noise-based visual injection attacks.
To mitigate the vulnerability, they propose a multi-agent defense that separates perception from decision-making and dynamically assesses the reliability of visual inputs.
The proposed defense reduces misleading behaviors substantially while maintaining correct responses, and offers robustness guarantees under adversarial perturbations, with the code and artifacts released publicly.

Abstract

Recent advances in embodied Vision-Language Agentic Systems (VLAS), powered by large vision-language models (LVLMs), enable AI systems to perceive and reason over real-world scenes. Within this context, environmental signals such as traffic lights are essential in-band signals that can and should influence agent behavior. However, similar signals could also be crafted to operate as misleading visual injections, overriding user intent and posing security risks. This duality creates a fundamental challenge: agents must respond to legitimate environmental cues while remaining robust to misleading ones. We refer to this tension as trust boundary confusion. To study this behavior, we design a dual-intent dataset and evaluation framework, through which we show that current LVLM-based agents fail to reliably balance this trade-off, either ignoring useful signals or following harmful ones. We systematically evaluate 7 LVLM agents across multiple embodied settings under both structure-based and noise-based visual injections. To address these vulnerabilities, we propose a multi-agent defense framework that separates perception from decision-making to dynamically assess the reliability of visual inputs. Our approach significantly reduces misleading behaviors while preserving correct responses and provides robustness guarantees under adversarial perturbations. The code of the evaluation framework and artifacts are made available at https://anonymous.4open.science/r/Visual-Prompt-Inject.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans

Dev.to

Elevating Austria: Google invests in its first data center in the Alps.

Google Blog

10 AI Tools Every Developer Should Try in 2026

Dev.to

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago

Dev.to

If you're waiting for a sign... that might not be it! Mitigating Trust Boundary Confusion from Visual Injections on Vision-Language Agentic Systems

Key Points

Abstract

Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans

Elevating Austria: Google invests in its first data center in the Alps.

10 AI Tools Every Developer Should Try in 2026

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer