LGEST: Dynamic Spatial-Spectral Expert Routing for Hyperspectral Image Classification

arXiv cs.CV / 3/26/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes LGEST, a Local-Global Expert Spatial-Spectral Transformer framework aimed at improving hyperspectral image (HSI) classification by better integrating local/global representations.
  • It introduces a Deep Spatial-Spectral Autoencoder (DSAE) to compress hyperspectral data into compact embeddings while preserving 3D neighborhood coherence and reducing information loss.
  • A Cross-Interactive Mixed Expert Feature Pyramid (CIEM-FPN) uses cross-attention and residual mixture-of-experts to dynamically fuse multi-scale features, adaptively weighting spectral and spatial cues via learnable gating.
  • A Local-Global Expert System (LGES) routes decomposed features to sparsely activated convolutional and transformer expert pairs, using a controller that selects experts based on feature saliency to handle high-dimensional heterogeneity and Hughes phenomenon.
  • Experiments on four benchmark datasets reportedly show LGEST consistently outperforming existing state-of-the-art HSI classification methods.

Abstract

Deep learning methods, including Convolutional Neural Networks, Transformers and Mamba, have achieved remarkable success in hyperspectral image (HSI) classification. Nevertheless, existing methods exhibit inflexible integration of local-global representations, inadequate handling of spectral-spatial scale disparities across heterogeneous bands, and susceptibility to the Hughes phenomenon under high-dimensional sample heterogeneity. To address these challenges, we propose Local-Global Expert Spatial-Spectral Transformer (LGEST), a novel framework that synergistically combines three key innovations. The LGEST first employs a Deep Spatial-Spectral Autoencoder (DSAE) to generate compact yet discriminative embeddings through hierarchical nonlinear compression, preserving 3D neighborhood coherence while mitigating information loss in high-dimensional spaces. Secondly, a Cross-Interactive Mixed Expert Feature Pyramid (CIEM-FPN) leverages cross-attention mechanisms and residual mixture-of-experts layers to dynamically fuse multi-scale features, adaptively weighting spectral discriminability and spatial saliency through learnable gating functions. Finally, a Local-Global Expert System (LGES) processes decomposed features via sparsely activated expert pairs: convolutional sub-experts capture fine-grained textures, while transformer sub-experts model long-range contextual dependencies, with a routing controller dynamically selecting experts based on real-time feature saliency. Extensive experiments on four benchmark datasets demonstrate that LGEST consistently outperforms state-of-the-art methods.