ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework

arXiv cs.CV / 3/24/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces ScaleEditor, a fully open-source hierarchical multi-agent framework designed to generate large-scale, diverse, and high-quality instruction-based image editing datasets without relying on costly proprietary APIs.
The end-to-end pipeline combines (1) source image expansion with world-knowledge infusion, (2) adaptive multi-agent instruction-image synthesis, and (3) task-aware data quality verification to improve edit realism and generalizability.
Using ScaleEditor, the authors curate ScaleEdit-12M, reported as the largest open-source image editing dataset to date, covering 23 task families across both real and synthetic domains.
Fine-tuning UniWorld-V1 and Bagel on ScaleEdit shows consistent performance improvements, including up to 10.4% on ImgEdit and 35.1% on GEdit for general editing benchmarks, and up to 150.0% on RISE and 26.5% on KRIS-Bench for knowledge-infused benchmarks.
The authors claim the results suggest open-source agentic dataset pipelines can approach commercial-grade data quality while remaining cost-effective and scalable, and both the framework and dataset are planned to be open-sourced.

Abstract

Instruction-based image editing has emerged as a key capability for unified multimodal models (UMMs), yet constructing large-scale, diverse, and high-quality editing datasets without costly proprietary APIs remains challenging. Previous image editing datasets either rely on closed-source models for annotation, which prevents cost-effective scaling, or employ fixed synthetic editing pipelines, which suffer from limited quality and generalizability. To address these challenges, we propose ScaleEditor, a fully open-source hierarchical multi-agent framework for end-to-end construction of large-scale, high-quality image editing datasets. Our pipeline consists of three key components: source image expansion with world-knowledge infusion, adaptive multi-agent editing instruction-image synthesis, and a task-aware data quality verification mechanism. Using ScaleEditor, we curate ScaleEdit-12M, the largest open-source image editing dataset to date, spanning 23 task families across diverse real and synthetic domains. Fine-tuning UniWorld-V1 and Bagel on ScaleEdit yields consistent gains, improving performance by up to 10.4% on ImgEdit and 35.1% on GEdit for general editing benchmarks and by up to 150.0% on RISE and 26.5% on KRIS-Bench for knowledge-infused benchmarks. These results demonstrate that open-source, agentic pipelines can approach commercial-grade data quality while retaining cost-effectiveness and scalability. Both the framework and dataset will be open-sourced.

Black Hat Asia

AI Business

"The Agent Didn't Decide Wrong. The Instructions Were Conflicting — and Nobody Noticed."

Dev.to

Top 5 LLM Gateway Alternatives After the LiteLLM Supply Chain Attack

Dev.to

Stop Counting Prompts — Start Reflecting on AI Fluency

Dev.to

Reliable Function Calling in Deeply Recursive Union Types: Fixing Qwen Models' Double-Stringify Bug

Dev.to

ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework

Key Points

Abstract

Related Articles

Black Hat Asia

"The Agent Didn't Decide Wrong. The Instructions Were Conflicting — and Nobody Noticed."

Top 5 LLM Gateway Alternatives After the LiteLLM Supply Chain Attack

Stop Counting Prompts — Start Reflecting on AI Fluency

Reliable Function Calling in Deeply Recursive Union Types: Fixing Qwen Models' Double-Stringify Bug

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer