Locatability-Guided Adaptive Reasoning for Image Geo-Localization with Vision-Language Models
arXiv cs.CV / 3/17/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces an Optimized Locatability Score to quantify how suitable an image is for deep reasoning in geo-localization tasks.
- It presents Geo-ADAPT-51K, a locatability-stratified reasoning dataset with augmented reasoning trajectories for complex scenes.
- A two-stage Group Relative Policy Optimization (GRPO) curriculum with customized rewards is proposed to regulate adaptive reasoning depth, visual grounding, and hierarchical geographical accuracy.
- The Geo-ADAPT framework learns an adaptive reasoning policy and reports state-of-the-art results on multiple geo-localization benchmarks while substantially reducing hallucinations.
- The work addresses limitations of retrieval-based and fixed-depth reasoning approaches, enabling more efficient and accurate image geo-localization using vision-language models.




