Generating Satellite Imagery Data for Wildfire Detection through Mask-Conditioned Generative AI

arXiv cs.AI / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • Labeled satellite imagery is a key bottleneck for deep-learning wildfire monitoring, and this paper tests whether EarthSynth (a diffusion foundation model for Earth observation) can generate realistic post-wildfire Sentinel-2 RGB images conditioned on burn masks without task-specific retraining.
  • The study uses CalFireSeg-50-derived burn masks and compares six controlled setups that vary generation pipeline type (mask-only full generation vs. mask-conditioned inpainting with pre-fire context), prompt strategy (hand-crafted vs. VLM-generated using Qwen2-VL), and region-wise color-matching post-processing.
  • Quantitative evaluation on 10 stratified test samples uses four metrics (Burn IoU, burn-region color distance, darkness contrast, and spectral plausibility), showing that inpainting consistently beats full-tile generation across metrics.
  • The best results come from the structured inpainting prompt, improving spatial alignment and burn saliency (e.g., Burn IoU = 0.456 and Darkness Contrast = 20.44), while color matching reduces color distance but can weaken burn saliency.
  • The authors conclude that VLM-assisted inpainting is competitive with hand-crafted prompts and that generative data augmentation could be integrated into wildfire detection pipelines, with code and experiments published on Kaggle.

Abstract

The scarcity of labeled satellite imagery remains a fundamental bottleneck for deep-learning (DL)-based wildfire monitoring systems. This paper investigates whether a diffusion-based foundation model for Earth Observation (EO), EarthSynth, can synthesize realistic post-wildfire Sentinel-2 RGB imagery conditioned on existing burn masks, without task-specific retraining. Using burn masks derived from the CalFireSeg-50 dataset (Martin et al., 2025), we design and evaluate six controlled experimental configurations that systematically vary: (i) pipeline architecture (mask-only full generation vs. inpainting with pre-fire context), (ii) prompt engineering strategy (three hand-crafted prompts and a VLM-generated prompt via Qwen2-VL), and (iii) a region-wise color-matching post-processing step. Quantitative assessment on 10 stratified test samples uses four complementary metrics: Burn IoU, burn-region color distance ({\Delta}C_burn), Darkness Contrast, and Spectral Plausibility. Results show that inpainting-based pipelines consistently outperform full-tile generation across all metrics, with the structured inpainting prompt achieving the best spatial alignment (Burn IoU = 0.456) and burn saliency (Darkness Contrast = 20.44), while color matching produces the lowest color distance ({\Delta}C_burn = 63.22) at the cost of reduced burn saliency. VLM-assisted inpainting is competitive with hand-crafted prompts. These findings provide a foundation for incorporating generative data augmentation into wildfire detection pipelines. Code and experiments are available at: https://www.kaggle.com/code/valeriamartinh/genai-all-runned