Low-Data Supervised Adaptation Outperforms Prompting for Cloud Segmentation Under Domain Shift
arXiv cs.CV / 4/13/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper tests the common assumption behind prompting vision-language models for remote sensing: that domain-specific language can steer frozen representations toward cloud segmentation under strong domain shift.
- Across 60 CLIPSeg prompt variants on the CloudSEN12+ benchmark, all prompting methods underperform the zero-shot baseline (0.255 mIoU), with engineered prompts as low as 0.07 mIoU.
- Supervised fine-tuning using extremely small labeled data (0.1% ≈ 8 images) improves overall performance beyond zero-shot, and 5–10% labeled data recovers about 85% of the best achievable mIoU.
- Full fine-tuning beats low-rank adaptation by 0.03–0.09 mIoU, with the largest improvements for spectrally ambiguous cloud classes.
- The authors observe a “supervision dip” at 0.5–1% labeled data for ambiguous classes that can be hidden in aggregate mIoU, emphasizing the need for per-class monitoring during adaptation.
Related Articles

Black Hat Asia
AI Business

Apple is building smart glasses without a display to serve as an AI wearable
THE DECODER

Why Fashion Trend Prediction Isn’t Enough Without Generative AI
Dev.to
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
Chatbot vs Voicebot: The Real Business Decision Nobody Talks About
Dev.to