SSFT: A Lightweight Spectral-Spatial Fusion Transformer for Generic Hyperspectral Classification

arXiv cs.CV / 4/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces SSFT, a lightweight Spectral-Spatial Fusion Transformer designed to improve hyperspectral classification under challenges like high dimensionality, spectral redundancy, limited labeled data, and domain shifts.
  • SSFT factorizes representation learning into separate spectral and spatial pathways and fuses them using cross-attention to capture complementary wavelength-dependent and structural information.
  • On the heterogeneous HSI-Benchmark (covering earth observation, fruit condition assessment, and fine-grained material recognition), SSFT achieves state-of-the-art overall performance while using under 2% of the parameters of the previous leading method.
  • The authors also test transfer performance on the larger SpectralEarth benchmark and find SSFT remains competitive despite its compact model size.
  • Ablation results indicate both pathways are necessary, with spatial modeling contributing most, and the method stays robust even without data augmentation.

Abstract

Hyperspectral imaging enables fine-grained recognition of materials by capturing rich spectral signatures, but learning robust classifiers is challenging due to high dimensionality, spectral redundancy, limited labeled data, and strong domain shifts. Beyond earth observation, labeled HSI data is often scarce and imbalanced, motivating compact models for generic hyperspectral classification across diverse acquisition regimes. We propose the lightweight Spectral-Spatial Fusion Transformer (SSFT), which factorizes representation learning into spectral and spatial pathways and integrates them via cross-attention to capture complementary wavelength-dependent and structural information. We evaluate our SSFT on the challenging HSI-Benchmark, a heterogeneous multi-dataset benchmark covering earth observation, fruit condition assessment, and fine-grained material recognition. SSFT achieves state-of-the-art overall performance, ranking first while using less than 2% of the parameters of the previous leading method. We further evaluate transfer to the substantially larger SpectralEarth benchmark under the official protocol, where SSFT remains competitive despite its compact size. Ablation studies show that both spectral and spatial pathways are crucial, with spatial modeling contributing most, and that SSFT remains robust without data augmentation.