Edge-Cloud Collaborative Reconstruction via Structure-Aware Latent Diffusion for Downstream Remote Sensing Perception

arXiv cs.CV / 4/29/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper addresses how extreme high-ratio satellite downlink compression irreversibly destroys high-frequency structural details needed for downstream remote sensing perception tasks.
  • It proposes Structure-Aware Latent Diffusion (SALD), an asymmetric edge-cloud super-resolution framework that transmits a compressed low-frequency payload plus a lightweight soft structural prior from the edge.
  • On the cloud side, SALD adds a Structure-Gated Large Kernel (SGLK) module and a Semantic-Guidance Engine (SGE) to use the received structural priors to better model long-range aerial dependencies while reducing structural hallucinations.
  • Experiments on MSCM and UCMerced show that SALD improves perceptual quality (LPIPS) under extreme bandwidth constraints and boosts downstream scene classification and small-target detection performance.

Abstract

The exponential surge in high-resolution remote sensing data faces a severe bottleneck in satellite-to-ground transmission. Limited downlink bandwidth forces the use of extreme high-ratio compression, which irreversibly destroys high-frequency structural details essential for downstream machine perception tasks like object detection. While current super-resolution techniques attempt to recover these details, regression-based methods often yield over-smoothed textures, and generative diffusion models frequently introduce structural hallucinations that mislead detection systems. To address this trade-off, we propose the Structure-Aware Latent Diffusion (SALD) framework, an asymmetric edge-cloud collaborative SR system. At the resource-constrained edge, the system decouples imagery into a highly compressed low-frequency payload and a lightweight soft structural prior. Transmitting this decoupled representation minimizes bandwidth consumption. On the powerful cloud side, we introduce a Structure-Gated Large Kernel (SGLK) module and a Semantic-Guidance Engine (SGE) within the diffusion backbone. These modules leverage the transmitted structural priors to gate large-kernel convolutions, effectively capturing long-range dependencies inherent in aerial scenes while actively suppressing generative hallucinations. Extensive experiments on both the MSCM and UCMerced datasets demonstrate that, even under extreme bandwidth constraints, SALD achieves superior perceptual quality (LPIPS) and significantly enhances downstream performance in both scene classification and small-target detection.