Hidden Clones: Exposing and Fixing Family Bias in Vision-Language Model Ensembles
arXiv cs.CV / 3/19/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper shows that ensembling Vision-Language Models from the same architectural family yields correlated errors, reducing ensemble diversity and creating a Misleading tier where majority errors can drive the answer to 0% accuracy even when the best model is correct.
- It introduces three family-aware methods: Hierarchical Family Voting (HFV) which aggregates within families before cross-family voting, QualRCCV which weights models by calibration, family quality, and inverse family size, and Learned Candidate Scoring (LCS) which trains a cross-validated classifier to re-rank candidate answers using features like support breadth, family diversity, and model quality.
- HFV recovers 18-26 percentage points on the Misleading tier, QualRCCV beats calibrated voting on all three benchmarks (p<0.05), and LCS delivers the largest gains with modest absolute improvements (+0.68% VQAv2, +0.61% TextVQA, +2.45% GQA) and never degrades any benchmark.
- On the VQAv2 test-standard EvalAI with 12 models, LCS reaches 87.83%, indicating strong generalization to held-out data.
Related Articles
Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO
Dev.to
How to Build Passive Income with AI in 2026: A Developer's Practical Guide
Dev.to
The Research That Doesn't Exist
Dev.to
Jeff Bezos reportedly wants $100 billion to buy and transform old manufacturing firms with AI
TechCrunch
Krish Naik: AI Learning Path For 2026- Data Science, Generative and Agentic AI Roadmap
Dev.to