A$^2$-Edit: Precise Reference-Guided Image Editing of Arbitrary Objects and Ambiguous Masks

arXiv cs.CV / 3/12/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

A²-Edit is a unified inpainting framework enabling precise reference-guided editing of arbitrary objects using only a coarse mask.
It introduces UniEdit-500K, a large-scale multi-category dataset with 8 major categories, 209 subcategories, and 500,104 image pairs to address homogenization and coverage.
It introduces a Mixture of Transformer module that enables dynamic expert selection to model different object categories and improve cross-category transfer and generalization.
It proposes a Mask Annealing Training Strategy (MATS) that progressively relaxes mask precision during training to improve robustness across diverse editing tasks, with strong performance on benchmarks such as VITON-HD and AnyInsertion.

Abstract

We propose \textbf{A

^2

-Edit}, a unified inpainting framework for arbitrary object categories, which allows users to replace any target region with a reference object using only a coarse mask. To address the issues of severe homogenization and limited category coverage in existing datasets, we construct a large-scale, multi-category dataset \textbf{UniEdit-500K}, which includes 8 major categories, 209 fine-grained subcategories, and a total of 500,104 image pairs. Such rich category diversity poses new challenges for the model, requiring it to automatically learn semantic relationships and distinctions across categories. To this end, we introduce the \textbf{Mixture of Transformer} module, which performs differentiated modeling of various object categories through dynamic expert selection, and further enhances cross-category semantic transfer and generalization through collaboration among experts. In addition, we propose a \textbf{Mask Annealing Training Strategy} (MATS) that progressively relaxes mask precision during training, reducing the model's reliance on accurate masks and improving robustness across diverse editing tasks. Extensive experiments on benchmarks such as VITON-HD and AnyInsertion demonstrate that A

^2

-Edit consistently outperforms existing approaches across all metrics, providing a new and efficient solution for arbitrary object editing.

Astral to Join OpenAI

Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Reddit r/LocalLLaMA

Why Data is Important for LLM

Dev.to

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

Dev.to

YouTube's Deepfake Shield for Politicians Changes Evidence Forever

Dev.to

A$^2$-Edit: Precise Reference-Guided Image Editing of Arbitrary Objects and Ambiguous Masks

Key Points

Abstract

Related Articles

Astral to Join OpenAI

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Why Data is Important for LLM

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

YouTube's Deepfake Shield for Politicians Changes Evidence Forever

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer