Deep Spatially-Regularized and Superpixel-Based Diffusion Learning for Unsupervised Hyperspectral Image Clustering

arXiv cs.CV / 4/16/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces an unsupervised hyperspectral image (HSI) clustering framework that combines masked deep representation learning with diffusion-based clustering.
  • It first trains an unsupervised masked autoencoder (UMAE) using a Vision Transformer backbone to learn a denoised latent representation, leveraging spatial context, long-range spectral correlations, and masking over only a small subset of pixels.
  • It then uses the entropy rate superpixel (ERS) method to segment the image and builds a spatially regularized diffusion graph using Euclidean and diffusion distances computed in the compressed latent space rather than directly in the original HSI space.
  • The proposed DS^2DL approach aims to obtain diffusion distances that better match the intrinsic geometry of the data manifold, improving clustering quality and labeling accuracy.
  • Experiments on the Botswana and KSC datasets are reported to validate the effectiveness of DS^2DL compared with prior diffusion-based approaches.

Abstract

An unsupervised framework for hyperspectral image (HSI) clustering is proposed that incorporates masked deep representation learning with diffusion-based clustering, extending the Spatially-Regularized Superpixel-based Diffusion Learning (S^2DL) algorithm. Initially, a denoised latent representation of the original HSI is learned via an unsupervised masked autoencoder (UMAE) model with a Vision Transformer backbone. The UMAE takes spatial context and long-range spectral correlations into account and incorporates an efficient pretraining process via masking that utilizes only a small subset of training pixels. In the next stage, the entropy rate superpixel (ERS) algorithm is used to segment the image into superpixels, and a spatially regularized diffusion graph is constructed using Euclidean and diffusion distances within the compressed latent space instead of the HSI space. The proposed algorithm, Deep Spatially-Regularized Superpixel-based Diffusion Learning (DS^2DL), leverages more faithful diffusion distances and subsequent diffusion graph construction that better reflect the intrinsic geometry of the underlying data manifold, improving labeling accuracy and clustering quality. Experiments on Botswana and KSC datasets demonstrate the efficacy of DS^2DL.