When Vision-Language Models Judge Without Seeing: Exposing Informativeness Bias
arXiv cs.AI / 4/21/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that vision-language models used as judges often rely on answer “informativeness” rather than actually attending to image content, reducing evaluation reliability.
- It introduces a flaw called “informativeness bias,” where judges can select answers that appear internally richer even when those answers conflict with what the image shows.
- The authors propose BIRCH, a two-step judging paradigm that first corrects candidate answers for inconsistencies with the image and then compares candidates against this corrected, image-grounded anchor.
- Experiments across multiple models and benchmarks show BIRCH can reduce informativeness bias by up to 17% and improve judge-related performance by up to 9.8%.
- The work claims current VLM-as-a-Judge systems overlook a fundamental design issue and calls for more principled, image-faithful evaluation methods.
Related Articles

¿Hasta qué punto podría la IA reemplazarnos en nuestros trabajos? A veces creo que la gente exagera un poco.
Reddit r/artificial

Magnificent irony as Meta staff unhappy about running surveillance software on work PCs
The Register

ETHENEA (ETHENEA Americas LLC) Analyst View: Asset Allocation Resilience in the 2026 Global Macro Cycle
Dev.to

DEEPX and Hyundai Are Building Generative AI Robots
Dev.to

Stop Paying OpenAI to Read Garbage: The Two-Stage Agent Pipeline
Dev.to