AI Navigate

Bin~Wan,G2HFNet: GeoGran-Aware Hierarchical Feature Fusion Network for Salient Object Detection in Optical Remote Sensing Images

arXiv cs.CV / 3/16/2026

📰 NewsModels & Research

Key Points

  • The paper proposes G2HFNet, a GeoGran-Aware Hierarchical Feature Fusion Network that uses a Swin Transformer backbone to extract multi-level features for salient object detection in optical remote sensing images.
  • It introduces three modules—MDE to handle object scale variations and enrich fine details, DGC to capture fine-grained details and positional information in mid-level features, and DSP to refine high-level positional cues through self-attention.
  • A local-global guidance fusion (LGF) module replaces traditional convolutions to integrate multi-level features more effectively.
  • Extensive experiments demonstrate that G2HFNet produces high-quality saliency maps and significantly improves detection performance in challenging remote sensing scenarios.

Abstract

Remote sensing images captured from aerial perspectives often exhibit significant scale variations and complex backgrounds, posing challenges for salient object detection (SOD). Existing methods typically extract multi-level features at a single scale using uniform attention mechanisms, leading to suboptimal representations and incomplete detection results. To address these issues, we propose a GeoGran-Aware Hierarchical Feature Fusion Network (G2HFNet) that fully exploits geometric and granular cues in optical remote sensing images. Specifically, G2HFNet adopts Swin Transformer as the backbone to extract multi-level features and integrates three key modules: the multi-scale detail enhancement (MDE) module to handle object scale variations and enrich fine details, the dual-branch geo-gran complementary (DGC) module to jointly capture fine-grained details and positional information in mid-level features, and the deep semantic perception (DSP) module to refine high-level positional cues via self-attention. Additionally, a local-global guidance fusion (LGF) module is introduced to replace traditional convolutions for effective multi-level feature integration. Extensive experiments demonstrate that G2HFNet achieves high-quality saliency maps and significantly improves detection performance in challenging remote sensing scenarios.