Beyond Visual Fidelity: Benchmarking Super-Resolution Models for Large-Scale Remote Sensing Imagery via Downstream Task Integration

arXiv cs.AI / 5/4/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that super-resolution (SR) benchmarks based on fidelity metrics like PSNR and SSIM may not reflect real-world usefulness for Earth observation downstream tasks.
  • It introduces GeoSR-Bench, a task-integrated SR benchmark dataset with spatially co-located, temporally aligned, quality-controlled image pairs from ~36,000 locations and diverse land covers.
  • The dataset covers resolutions from 500m to 0.6m and supports multiple downstream monitoring tasks such as land cover segmentation, infrastructure mapping, and biophysical variable estimation.
  • Experiments benchmark GAN, transformer, neural operator, and diffusion-based SR models using 270 experimental settings across cross-platform SR tasks, SR models, downstream task models, and tasks.
  • Results indicate that better traditional SR metrics can fail to improve downstream task performance and may even show negative correlation, motivating downstream-task integration into SR evaluation and development.

Abstract

Super-resolution (SR) techniques have made major advances in reconstructing high-resolution images from low-resolution inputs. The increased resolution provides visual enhancement and utility for monitoring tasks. In particular, SR has been increasingly developed for satellite-based Earth observation, with applications in urban planning, agriculture, ecology, and disaster response. However, existing SR studies and benchmarks typically use fidelity metrics such as PSNR or SSIM, whereas the true utility of super-resolved images lies in supporting downstream tasks such as land cover classification, biomass estimation, and change detection. To bridge this gap, we introduce GeoSR-Bench, a downstream task-integrated SR benchmark dataset to evaluate SR models beyond fidelity metrics. GeoSR-Bench comprises spatially co-located, temporally aligned, and quality-controlled image pairs from about 36,000 locations across diverse land covers, spanning resolutions from 500m to 0.6m. To the best of our knowledge, GeoSR-Bench is the first SR benchmark that directly connects improved image resolution from SR models with downstream Earth monitoring tasks, including land cover segmentation, infrastructure mapping, and biophysical variable estimation. Using GeoSR-Bench, we benchmark GAN, transformer, neural operator, and diffusion-based SR models on perceptual quality and downstream task performance. We conduct experiments with 270 settings, covering 2 cross-platform SR tasks, 9 SR models, 3 downstream task models, and 5 downstream tasks for each SR task. The results show that improvements in traditional SR metrics often do not correlate with gains in task performance, and the correlations can be negative, indicating that these metrics provide limited guidance for selecting superior models for downstream tasks. This reveals the need to integrate downstream tasks into SR model development and evaluation.