Edge Reliability Gap in Vision-Language Models: Quantifying Failure Modes of Compressed VLMs Under Visual Corruption
arXiv cs.CV / 3/31/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies whether compressed/quantized vision-language models fail in qualitatively different ways from larger FP16 VLMs when faced with visual corruption, not just at lower accuracy.
- It compares a 4-bit quantized 7B model (Qwen2.5-VL-7B, NF4) to a 500M FP16 model (SmolVLM2-500M) across 4,000 samples from VQAv2 and COCO, using a three-part error taxonomy: Object Blindness, Semantic Drift, and Prior Bias.
- Semantic Drift is identified as the dominant failure mode on VQAv2 for both models and on COCO specifically for Qwen, while Prior Bias appears on VQAv2 but is absent on COCO for both.
- The compact model shows a significantly larger “negation collapse” under compositional negation probes, driven largely by COCO (a statistically significant 12.5pp gap), and a key template (false_yn) reveals extreme bias toward “Yes” on COCO for SmolVLM2.
- The authors evaluate confidence calibration via Expected Calibration Error (ECE), include blur robustness experiments, and release a fully reproducible pipeline intended for systematic safety auditing before edge deployment.
Related Articles

Black Hat Asia
AI Business
[D] How does distributed proof of work computing handle the coordination needs of neural network training?
Reddit r/MachineLearning

Claude Code's Entire Source Code Was Just Leaked via npm Source Maps — Here's What's Inside
Dev.to

BYOK is not just a pricing model: why it changes AI product trust
Dev.to

AI Citation Registries and Identity Persistence Across Records
Dev.to