ConvVitMamba: Efficient Multiscale Convolution, Transformer, and Mamba-Based Sequence modelling for Hyperspectral Image Classification
arXiv cs.CV / 4/22/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- Hyperspectral image (HSI) classification is difficult because of high spectral dimensionality, redundancy, and limited labeled data, motivating more efficient yet accurate sequence/spatial modeling.
- The paper proposes ConvVitMamba, a hybrid architecture combining multiscale convolution for local spectral-spatial patterns, a Vision Transformer tokenization/encoding stage for global context, and a lightweight Mamba-inspired gated sequence-mixing module to avoid costly quadratic self-attention.
- Principal Component Analysis (PCA) preprocessing is used to reduce spectral redundancy and improve overall efficiency.
- Experiments on four benchmarks (including Houston and three UAV-based QUH datasets) show ConvVitMamba consistently outperforming CNN-, Transformer-, and Mamba-based approaches while keeping a favorable accuracy–model size–inference trade-off.
- Ablation studies validate that each of the three components contributes complementarily to the final performance, and the authors release the source code publicly.
Related Articles
Autoencoders and Representation Learning in Vision
Dev.to
Every AI finance app wants your data. I didn’t trust that — so I built my own. Offline.
Dev.to
Control Claude with Just a URL. The Chrome Extension "Send to Claude" Is Incredibly Useful
Dev.to
Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks
Dev.to

Now Meta will track what employees do on their computers to train its AI agents
The Verge