VisualLeakBench: Auditing the Fragility of Large Vision-Language Models against PII Leakage and Social Engineering
arXiv cs.CV / 3/17/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- VisualLeakBench introduces an evaluation suite to audit LVLMs for OCR injection and contextual PII leakage using 1,000 synthetic adversarial images across 8 PII types, plus 50 real-world screenshots for validation.
- The study benchmarks four frontier LVLMs (GPT-5.2, Claude 4, Gemini-3 Flash, Grok-4) and reports OCR and PII leakage rates with Wilson 95% confidence intervals, noting tradeoffs between OCR robustness and PII leakage.
- Claude 4 shows the lowest OCR leakage (14.2% ASR) but the highest PII leakage (74.4%), indicating a comply-then-warn pattern in its responses.
- Grok-4 achieves the lowest PII leakage at 20.4%, highlighting model-to-model variability in privacy leakage.
- A defensive system prompt significantly reduces PII leakage across models (e.g., Claude 4 drops to 2.2%), though effectiveness varies by model and data type, with Gemini-3 Flash remaining vulnerable on synthetic data; real-world tests show mitigation effects can be template-sensitive.
- The authors release the dataset and code for reproducible safety evaluation of deployment-relevant vision-language systems.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles
How CVE-2026-25253 exposed every OpenClaw user to RCE — and how to fix it in one command
Dev.to
Does Synthetic Data Generation of LLMs Help Clinical Text Mining?
Dev.to
What CVE-2026-25253 Taught Me About Building Safe AI Assistants
Dev.to
Day 52: Building vs Shipping — Why We Had 711 Commits and 0 Users
Dev.to
The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX
Dev.to