CGFformer: Cluster-Guidance Frequency Transformer for Pansharpening

arXiv cs.CV / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces CGFformer, a frequency-guidance Transformer for pansharpening that targets the limitations of fixed frequency filters in handling spatially varying frequency distributions between PAN and MS images.
It uses an adaptive separation module with K-means clustering to better distinguish high- and low-frequency components by combining local features with non-local information.
A dual-stream refinement module with Transformer-based cross-attention is proposed to denoise more effectively by suppressing multiple noise types, including those tied to relevant and irrelevant frequency components.
The model also includes a frequency-spatial fusion module to improve detail reconstruction and enable stronger interaction between spatial and frequency representations.
Experiments on multiple benchmark datasets reportedly show CGFformer delivers notable performance gains over existing pansharpening methods.

Abstract

Pansharpening aims to generate high-resolution multispectral (HRMS) images by fusing low-resolution multispectral (LRMS) images with high-resolution panchromatic (PAN) images. However, the current mainstream frequency-based pansharpening methods employ fixed frequency filters, which cannot precisely adapt to complex and spatially diversified frequency distributions in PAN and MS images. Furthermore, existing denoising strategies insufficiently exploit frequency components for denoising and struggle to suppress various noise types accurately. To address these challenges, we propose CGFformer, a cluster-guidance frequency Transformer that focuses on varying frequency distribution and interactions between frequency and spatial components. Specifically, we design an adaptive separation module that integrates local features and non-local information through K-means clustering, enabling more precise separation of high- and low-frequency components. Subsequently, we introduce a dual-stream refinement module combined with Transformer-based cross-attention to remove various noise, allowing the network to jointly suppress frequency-relevant and irrelevant disturbances. In addition, we develop a frequency-spatial fusion module designed to enhance detail and facilitate spatial-frequency interaction, ensuring more effective reconstruction of spatial structures in the fused results. Extensive experiments on multiple benchmark datasets demonstrate that the proposed CGFformer achieves notable improvements over existing pansharpening approaches.