Survey on Remote Sensing Scene Classification: From Traditional Methods to Large Generative AI Models

arXiv cs.CV / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The survey traces how remote sensing scene classification evolved from handcrafted feature methods and classical machine learning toward deep learning and modern transformer/graph-based architectures.
  • It covers recent advances in self-supervised foundation models and vision-language systems, emphasizing strong zero-shot and few-shot performance for remote sensing tasks.
  • The article highlights generative AI approaches—especially synthetic data generation and improved feature learning—to address long-standing issues such as data scarcity and difficult-to-label scenarios.
  • It analyzes current bottlenecks including high annotation costs, multimodal fusion complexity, interpretability requirements, and ethical concerns.
  • It proposes future research priorities around hyperspectral and multi-temporal modeling, stronger cross-domain generalization, and standardized evaluation protocols to improve scientific comparability.

Abstract

Remote sensing scene classification has experienced a paradigmatic transformation from traditional handcrafted feature methods to sophisticated artificial intelligence systems that now form the backbone of modern Earth observation applications. This comprehensive survey examines the complete methodological evolution, systematically tracing development from classical texture descriptors and machine learning classifiers through the deep learning revolution to current state-of-the-art foundation models and generative AI approaches. We chronicle the pivotal shift from manual feature engineering to automated hierarchical representation learning via convolutional neural networks, followed by advanced architectures including Vision Transformers, graph neural networks, and hybrid frameworks. The survey provides in-depth coverage of breakthrough developments in self-supervised foundation models and vision-language systems, highlighting exceptional performance in zero-shot and few-shot learning scenarios. Special emphasis is placed on generative AI innovations that tackle persistent challenges through synthetic data generation and advanced feature learning strategies. We analyze contemporary obstacles including annotation costs, multimodal data fusion complexities, interpretability demands, and ethical considerations, alongside current trends in edge computing deployment, federated learning frameworks, and sustainable AI practices. Based on comprehensive analysis of recent advances and gaps, we identify key future research priorities: advancing hyperspectral and multi-temporal analysis capabilities, developing robust cross-domain generalization methods, and establishing standardized evaluation protocols to accelerate scientific progress in remote sensing scene classification systems.