AutoVDC: Automated Vision Data Cleaning Using Vision-Language Models
arXiv cs.RO / 5/1/2026
💬 OpinionTools & Practical UsageModels & Research
Key Points
- The AutoVDC framework uses vision-language models (VLMs) to automatically detect incorrect annotations in vision datasets, aiming to reduce manual dataset-cleaning effort.
- The study evaluates AutoVDC on autonomous-driving object-detection benchmarks (KITTI and nuImages) and creates dataset variants with intentionally injected annotation errors to measure detection performance.
- Experiments compare error-detection effectiveness across different VLMs and examine how fine-tuning VLMs affects the cleaning pipeline.
- Results indicate strong error detection and improved data-cleaning outcomes, suggesting AutoVDC can raise the reliability and accuracy of large-scale production datasets for autonomous driving.
- The work targets the common problem that human labeling is imperfect and often requires multiple costly review iterations to reach usable dataset quality.
Related Articles

Black Hat USA
AI Business

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!
Reddit r/artificial

Announcing the NVIDIA Nemotron 3 Super Build Contest
Dev.to

75% of Sites Blocking AI Bots Still Get Cited. Here Is Why Blocking Does Not Work.
Dev.to