DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop Agents

arXiv cs.AI / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The paper introduces DataEvolver, a closed-loop “visual data engine” that uses explicit goals and iterative generation–inspection–correction–filtering–export to create controllable training data for image editing and multimodal understanding.
  • DataEvolver is designed to manage multiple persistent artifact types, including RGB images, masks, depth/normal maps, meshes, poses, trajectories, and review traces.
  • The system’s current release uses two coupled loops: in-sample self-correction during generation and cross-round self-expansion during dataset validation.
  • Experiments on an image-level object-rotation task show that the proposed Ours+DualGate approach, using a fixed Qwen-Edit LoRA probe, outperforms an unadapted base model and a public multi-angle LoRA on both SpatialEdit and a held-out evaluation set.
  • Ablation results indicate a consistent performance improvement path from scene-aware generation to feedback-driven correction and dual-gated validation, with the core contribution framed as a reusable dataset-building framework.

Abstract

Constructing controllable visual data is a major bottleneck for image editing and multimodal understanding. Useful supervision is rarely produced by a single rendering pass; instead it emerges through iterative generation, inspection, correction, filtering, and export. We present DataEvolver, a closed-loop visual data engine that organizes this process around explicit goals, persistent artifacts, bounded corrective actions, and acceptance decisions. DataEvolver supports multiple artifact types, including RGB images, masks, depth maps, normal maps, meshes, poses, trajectories, and review traces. In the current release, the system operates through two coupled loops: generation-time self-correction within each sample and validation-time self-expansion across dataset rounds. We validate the framework on an image-level object-rotation setting. With a fixed Qwen-Edit LoRA probe, our final Ours+DualGate model outperforms both the unadapted base model and a public multi-angle LoRA on SpatialEdit and a held-out evaluation set. Ablations show a consistent improvement path from scene-aware generation to feedback-driven correction and dual-gated validation. Beyond the released rotation data, our main contribution is a reusable framework for building visual datasets through explicit goal tracking, review, correction, and acceptance loops.