VisualLeakBench: Auditing the Fragility of Large Vision-Language Models against PII Leakage and Social Engineering

arXiv cs.CV / 3/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

VisualLeakBench introduces an evaluation suite to audit LVLMs for OCR injection and contextual PII leakage using 1,000 synthetic adversarial images across 8 PII types, plus 50 real-world screenshots for validation.
The study benchmarks four frontier LVLMs (GPT-5.2, Claude 4, Gemini-3 Flash, Grok-4) and reports OCR and PII leakage rates with Wilson 95% confidence intervals, noting tradeoffs between OCR robustness and PII leakage.
Claude 4 shows the lowest OCR leakage (14.2% ASR) but the highest PII leakage (74.4%), indicating a comply-then-warn pattern in its responses.
Grok-4 achieves the lowest PII leakage at 20.4%, highlighting model-to-model variability in privacy leakage.
A defensive system prompt significantly reduces PII leakage across models (e.g., Claude 4 drops to 2.2%), though effectiveness varies by model and data type, with Gemini-3 Flash remaining vulnerable on synthetic data; real-world tests show mitigation effects can be template-sensitive.
The authors release the dataset and code for reproducible safety evaluation of deployment-relevant vision-language systems.

Abstract

As Large Vision-Language Models (LVLMs) are increasingly deployed in agent-integrated workflows and other deployment-relevant settings, their robustness against semantic visual attacks remains under-evaluated -- alignment is typically tested on explicit harmful content rather than privacy-critical multimodal scenarios. We introduce VisualLeakBench, an evaluation suite to audit LVLMs against OCR Injection and Contextual PII Leakage using 1,000 synthetically generated adversarial images with 8 PII types, validated on 50 in-the-wild (IRL) real-world screenshots spanning diverse visual contexts. We evaluate four frontier systems (GPT-5.2, Claude~4, Gemini-3 Flash, Grok-4) with Wilson 95% confidence intervals. Claude~4 achieves the lowest OCR ASR (14.2%) but the highest PII ASR (74.4%), exhibiting a comply-then-warn pattern -- where verbatim data disclosure precedes any safety-oriented language. Grok-4 achieves the lowest PII ASR (20.4%). A defensive system prompt eliminates PII leakage for two models, reduces Claude~4's leakage from 74.4% to 2.2%, but has no effect on Gemini-3 Flash on synthetic data. Strikingly, IRL validation reveals Gemini-3 Flash does respond to mitigation on real-world images (50% to 0%), indicating that mitigation robustness is template-sensitive rather than uniformly absent. We release our dataset and code for reproducible robustness and safety evaluation of deployment-relevant vision-language systems.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/19WeeklyView insight →📅 3/17DailyView insight →

How CVE-2026-25253 exposed every OpenClaw user to RCE — and how to fix it in one command

Dev.to

Does Synthetic Data Generation of LLMs Help Clinical Text Mining?

Dev.to

What CVE-2026-25253 Taught Me About Building Safe AI Assistants

Dev.to

Day 52: Building vs Shipping — Why We Had 711 Commits and 0 Users

Dev.to

The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX

Dev.to

VisualLeakBench: Auditing the Fragility of Large Vision-Language Models against PII Leakage and Social Engineering

Key Points

Abstract

💡 Insights using this article

Related Articles

How CVE-2026-25253 exposed every OpenClaw user to RCE — and how to fix it in one command

Does Synthetic Data Generation of LLMs Help Clinical Text Mining?

What CVE-2026-25253 Taught Me About Building Safe AI Assistants

Day 52: Building vs Shipping — Why We Had 711 Commits and 0 Users

The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer