Consistent but Dangerous: Per-Sample Safety Classification Reveals False Reliability in Medical Vision-Language Models
arXiv cs.CV / 3/24/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that using paraphrase consistency as a proxy for reliability in medical vision-language models is fundamentally flawed because models can remain perfectly consistent while ignoring the input image and relying on text patterns.
- It introduces a four-quadrant per-sample safety taxonomy (Ideal, Fragile, Dangerous, Worst) that evaluates both consistency across paraphrased prompts and whether predictions depend on the image.
- Experiments on five medical VLM configurations over two chest X-ray datasets (MIMIC-CXR and PadChest) show that LoRA fine-tuning can sharply reduce prediction flip rates while moving most samples into the “Dangerous” category, indicating false reliability.
- “Dangerous” samples can still be highly accurate (up to 99.6%) with low entropy, meaning confidence-based screening may miss the image-ignoring failure mode.
- The authors recommend deployment evaluations combine consistency checks with a text-only baseline (e.g., an additional forward pass without the image) to detect this trap efficiently.
Related Articles
The Security Gap in MCP Tool Servers (And What I Built to Fix It)
Dev.to
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
I made a new programming language to get better coding with less tokens.
Dev.to
RSA Conference 2026: The Week Vibe Coding Security Became Impossible to Ignore
Dev.to

Adversarial AI framework reveals mechanisms behind impaired consciousness and a potential therapy
Reddit r/artificial