Leveraging Vision-Language Models as Weak Annotators in Active Learning
arXiv cs.CV / 5/4/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper explores using vision-language models (VLMs) to reduce human annotation cost in active learning by generating weak labels instead of fully labeling every sample.
- It finds that VLM reliability depends strongly on label granularity in fine-grained recognition: VLMs struggle with fine-grained labels but can produce accurate coarse-grained labels.
- The authors propose an active learning framework that assigns labels instance-wise by combining limited fine-grained human annotations with coarse-grained VLM-generated weak labels.
- They also account for systematic noise in the VLM-generated labels by calibrating with a small set of trusted full (human) labels.
- Experiments on CUB200 and FGVC-Aircraft show the approach consistently beats prior active learning methods using the same annotation budget.
Related Articles
AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs
Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI
The Verge

CLMA Frame Test
Dev.to

You Are Right — You Don't Need CLAUDE.md
Dev.to

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions
Dev.to