OSM-based Domain Adaptation for Remote Sensing VLMs
arXiv cs.CV / 3/13/2026
📰 NewsModels & Research
Key Points
- OSMDA is a self-contained domain adaptation framework for remote sensing Vision-Language Models that eliminates reliance on large teacher models or manual labeling.
- It pairs aerial images with rendered OpenStreetMap tiles to generate captions through the model's OCR and chart comprehension, enriching training data with OSM metadata.
- The model is fine-tuned on satellite imagery alone to produce OSMDA-VLM, achieving state-of-the-art results across 10 benchmarks while being cheaper to train than teacher-dependent approaches.
- The authors will publicly release the dataset and model weights, demonstrating the practicality and scalability of alignment with crowd-sourced geographic data.
Related Articles

Jeff Bezos reportedly wants $100 billion to buy and transform old manufacturing firms with AI
TechCrunch
[R] Weekly digest: arXiv AI security papers translated for practitioners -- Cascade (cross-stack CVE+Rowhammer attacks on compound AI), LAMLAD (dual-LLM adversarial ML, 97% evasion), OpenClaw (4 vuln classes in agent frameworks)
Reddit r/MachineLearning
My Experience with Qwen 3.5 35B
Reddit r/LocalLLaMA

Cursor’s new coding model Composer 2 is here: It beats Claude Opus 4.6 but still trails GPT-5.4
VentureBeat
Qwen 3.5 122B completely falls apart at ~ 100K context
Reddit r/LocalLLaMA