A Data-Centric Vision Transformer Baseline for SAR Sea Ice Classification
arXiv cs.CV / 4/6/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes a trustworthy SAR-only Vision Transformer baseline for sea-ice classification, explicitly avoiding claims of a fully validated multimodal system.
- It trains ViT-Base and ViT-Large models on the AI4Arctic/ASIP Sea Ice Dataset (v2) using Sentinel-1 Extra Wide full-resolution inputs, leakage-aware stratified patch splitting, SIGRID-3 development labels, and training-set normalization.
- Experiments compare cross-entropy and weighted cross-entropy for ViT-Base versus focal loss for ViT-Large to address severe class imbalance among morphologically similar ice types.
- ViT-Large with focal loss achieves 69.6% held-out accuracy, 68.8% weighted F1, and strong minority-class performance for Multi-Year Ice (83.9% precision), showing improved precision–recall trade-offs versus weighted cross-entropy.
- The authors position focal-loss ViT results as a cleaner reference point for future fusion work that combines SAR with optical, thermal, or meteorological data.




