DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop Agents
arXiv cs.AI / 5/5/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The paper introduces DataEvolver, a closed-loop “visual data engine” that uses explicit goals and iterative generation–inspection–correction–filtering–export to create controllable training data for image editing and multimodal understanding.
- DataEvolver is designed to manage multiple persistent artifact types, including RGB images, masks, depth/normal maps, meshes, poses, trajectories, and review traces.
- The system’s current release uses two coupled loops: in-sample self-correction during generation and cross-round self-expansion during dataset validation.
- Experiments on an image-level object-rotation task show that the proposed Ours+DualGate approach, using a fixed Qwen-Edit LoRA probe, outperforms an unadapted base model and a public multi-angle LoRA on both SpatialEdit and a held-out evaluation set.
- Ablation results indicate a consistent performance improvement path from scene-aware generation to feedback-driven correction and dual-gated validation, with the core contribution framed as a reusable dataset-building framework.
Related Articles

Black Hat USA
AI Business

When Claims Freeze Because a Provider Record Drifted: The Case for Enrollment Repair Agents
Dev.to

Why Ship-and-Debit Claim Recovery Is a Better Agent Wedge Than Another “AI Back Office” Tool
Dev.to
AI is getting better at doing things, but still bad at deciding what to do?
Reddit r/artificial

I Built an AI-Powered Chinese BaZi (八字) Fortune Teller — Here's What DeepSeek Revealed About Destiny
Dev.to