Flow matching for Sentinel-2 super-resolution: implementation, application, and implications

arXiv cs.CV / 5/4/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper introduces a flow matching model for 4× super-resolution of Sentinel-2 10 m visible/NIR bands (resolving 10 m → 2.5 m) using paired Sentinel-2 and same-day NAIP imagery, addressing the usual trade-off between spectral fidelity and perceptual quality.
In experiments, the flow matching approach beats diffusion and Real-ESRGAN on pixel-wise accuracy in a single Euler sampling step, and with a second-order Midpoint solver it produces perceptually realistic outputs in only 20 sampling steps without retraining.
The authors deploy the model to generate a full 2.5 m, 4-band CONUS super-resolved product from 2025 Sentinel-2 annual composites (over 1.58 trillion pixels) and also derive yearly 2.5 m land-cover products for the Chesapeake Bay watershed (2020–2025).
For downstream use, semantic segmentation on super-resolved data shows strong utility; for the Chesapeake Bay land-cover product, accuracy assessed against 25,000 ground-truth points reaches 89.11% overall.
Overall, the work concludes that flow matching is an effective generative modeling alternative to diffusion and GAN-based super-resolution for satellite data, with implications for wider access to high-resolution geospatial inputs.

Abstract

Developing robust techniques for super-resolution of satellite imagery involves navigating commonly observed trade-offs between spectral fidelity and perceptual quality. In this work, we introduce a flow matching model for 4x super-resolution of 10-m Sentinel-2 visible and near-infrared bands over the conterminous United States (CONUS) using a dataset of 120,851 10-m Sentinel-2 and 2.5-m resampled NAIP imagery pairs acquired on the same day. Our results showed that the flow matching model outperformed diffusion and Real-ESRGAN models in pixel-wise accuracy in a single sampling step using the Euler method. When evaluated with a second-order Midpoint solver, our model generated perceptually realistic super-resolved imagery in only 20 sampling steps, effectively navigating the perception-distortion trade-off at inference time without retraining. We used this model to produce a super-resolved 2.5-m 4-band CONUS imagery product derived from 2025 10-m Sentinel-2 annual composites, consisting of over 1.58 trillion pixels. We further evaluated the use of super-resolved data on a land cover classification task using semantic segmentation models. Finally, we generated a yearly 2.5-m land cover product for the Chesapeake Bay watershed for 2020-2025. An accuracy assessment against 25,000 ground truth points revealed an overall accuracy of 89.11% for the annual land cover product. We conclude that flow matching is an effective generative modeling approach for super-resolution of Sentinel-2 imagery compared to diffusion and Generative Adversarial Network-based methods, and has strong implications for expanding access to high-resolution imagery for geospatial applications that demand fine spatial detail.