Disparities In Negation Understanding Across Languages In Vision-Language Models
arXiv cs.CL / 4/22/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- Vision-language models often show affirmation bias, mistakenly selecting positive captions even when the correct answer requires negation.
- The study shows that negation behavior differs by language due to factors like morphology, word order, and cliticization, which may affect how well existing fixes generalize.
- Researchers introduce the first human-verified multilingual negation benchmark covering seven diverse languages (English, Mandarin, Arabic, Greek, Russian, Tagalog, Spanish).
- Evaluation of CLIP, SigLIP, and MultiCLIP finds that standard CLIP is at or below chance on non-Latin-script languages, while MultiCLIP delivers the highest and most consistent accuracy.
- A proposed negation-correction method (SpaceVLM) improves results for multiple languages, but effectiveness varies across typologically different languages, highlighting fairness-relevant interactions between language properties and model improvements.
Related Articles
Autoencoders and Representation Learning in Vision
Dev.to
Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks
Dev.to

Now Meta will track what employees do on their computers to train its AI agents
The Verge
Context Bloat in AI Agents
Dev.to

We open sourced the AI dev team that builds our product
Dev.to