CBV: Clean-label Backdoor Attacks on Vision Language Models via Diffusion Models
arXiv cs.AI / 5/5/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper reports that vision-language models (VLMs) are vulnerable to backdoor attacks, and that prior methods often use data poisoning that creates detectable image-text mismatches via visual triggers and altered text labels.
- It proposes CBV (Clean-Label Backdoor Attack on VLMs via Diffusion Models), which generates natural-looking poisoned samples using diffusion models by steering the reverse diffusion process with modified scores via score matching.
- CBV further improves attack effectiveness by using multimodal guidance that incorporates text information derived from the triggered images during generation.
- To increase stealth, the method introduces a GradCAM-guided mask (GM) so that perturbations are applied only to the most semantically important regions rather than the whole image.
- Experiments on MSCOCO and VQA v2 using four representative VLMs show over 80% attack success rate (ASR) while keeping normal model functionality largely intact.
Related Articles

When Claims Freeze Because a Provider Record Drifted: The Case for Enrollment Repair Agents
Dev.to

The Cash Is Already Earned: Why Construction Pay Application Exceptions Fit an Agent Better Than SaaS
Dev.to

Why Ship-and-Debit Claim Recovery Is a Better Agent Wedge Than Another “AI Back Office” Tool
Dev.to
AI is getting better at doing things, but still bad at deciding what to do?
Reddit r/artificial

I Built an AI-Powered Chinese BaZi (八字) Fortune Teller — Here's What DeepSeek Revealed About Destiny
Dev.to