RSEdit: Text-Guided Image Editing for Remote Sensing

arXiv cs.CV / 3/17/2026

📰 NewsTools & Practical UsageModels & Research

共有:

Key Points

RSEdit addresses artifacts and hallucinations that occur when applying general-domain text-guided image editors to remote sensing imagery due to limited RS knowledge and misaligned conditioning.
It unifies pretrained diffusion models (U-Net and DiT) into instruction-following RS editors through channel concatenation and in-context token concatenation, enabling precise, physically coherent edits while preserving geospatial content.
Trained on over 60,000 bi-temporal RS image pairs, RSEdit demonstrates strong gains over general and commercial baselines and generalizes across disaster impacts, urban growth, and seasonal shifts.
The authors will release code, pretrained models, evaluation protocols, training logs, and generated results for full reproducibility, with code available at the linked GitHub repository.

Abstract

General-domain text-guided image editors achieve strong photorealism but introduce artifacts, hallucinate objects, and break the orthographic constraints of remote sensing (RS) imagery. We trace this gap to two high-level causes: (i) limited RS world knowledge in pre-trained models, and (ii) conditioning schemes that misalign with the bi-temporal structure and spatial priors of Earth observation data. We present RSEdit, a unified framework that adapts pretrained text-to-image diffusion models - both U-Net and DiT - into instruction-following RS editors via channel concatenation and in-context token concatenation. Trained on over 60,000 semantically rich bi-temporal remote sensing image pairs, RSEdit learns precise, physically coherent edits while preserving geospatial content. Experiments show clear gains over general and commercial baselines, demonstrating strong generalizability across diverse scenarios including disaster impacts, urban growth, and seasonal shifts, positioning RSEdit as a robust data engine for downstream analysis. We will release code, pretrained models, evaluation protocols, training logs, and generated results for full reproducibility. Code: https://github.com/Bili-Sakura/RSEdit-Preview