Noise2Map: End-to-End Diffusion Model for Semantic Segmentation and Change Detection

arXiv cs.CV / 5/1/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • Noise2Mapは、拡散モデルの「ノイズ除去」プロセスを、リモートセンシングの意味セグメンテーション(SS)と変化検出(CD)のための識別タスクへ転用したエンドツーエンドの枠組みです。
  • 従来の生成中心の拡散モデルで必要だった高コストなサンプリング手続きを避け、タスク固有のノイズスケジュールとタイムステップ条件付けにより意味/変化マップを直接予測します。
  • 自己教師ありのデノージングで事前学習し、教師ありで微調整することで、解釈可能性とロバスト性の両立を狙っています。
  • 共通バックボーンにタスク別ノイズ・スケジューラを組み合わせることで、SSとCDの両タスクをマルチタスク学習として同一モデル内で扱えます。
  • SpaceNet7、WHU、xView2(山火事による建物被害)での評価では、7モデル中で平均順位が意味セグメンテーションで1位、変化検出でも1位(クロスデータセット指標:平均F1、IoUでタイブレーク)となり、ノイズスケジューラやタイムステップ制御への頑健性が示されています。

Abstract

Semantic segmentation and change detection are two fundamental challenges in remote sensing, requiring models to capture either spatial semantics or temporal differences from satellite imagery. Existing deep learning models often struggle with temporal inconsistencies or in capturing fine-grained spatial structures, require extensive pretraining, and offer limited interpretability - especially in real-world remote sensing scenarios. Recent advances in diffusion models show that Gaussian noise can be systematically leveraged to learn expressive data representations through denoising. Motivated by this, we investigate whether the noise process in diffusion models can be effectively utilized for discriminative tasks. We propose Noise2Map, a unified diffusion-based framework that repurposes the denoising process for fast, end-to-end discriminative learning. Unlike prior work that uses diffusion only for generation or feature extraction, Noise2Map directly predicts semantic or change maps using task-specific noise schedules and timestep conditioning, avoiding the costly sampling procedures of traditional diffusion models. The model is pretrained via self-supervised denoising and fine-tuned with supervision, enabling both interpretability and robustness. Our architecture supports both tasks (SS and CD) through a shared backbone and task-specific noise schedulers. Extensive evaluations on the SpaceNet7, WHU, and xView2 buildings damaged by wildfires datasets demonstrate that Noise2Map ranks on average 1st among seven models on semantic segmentation and 1st on change detection by a cross-dataset rank metric (average F1 primary, IoU tie-break). Ablation studies highlight the robustness of our model against different training noise schedulers and timestep control in the diffusion process, as well as the ability of the model to perform multi-task learning.