AI Navigate

Locatability-Guided Adaptive Reasoning for Image Geo-Localization with Vision-Language Models

arXiv cs.CV / 3/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces an Optimized Locatability Score to quantify how suitable an image is for deep reasoning in geo-localization tasks.
  • It presents Geo-ADAPT-51K, a locatability-stratified reasoning dataset with augmented reasoning trajectories for complex scenes.
  • A two-stage Group Relative Policy Optimization (GRPO) curriculum with customized rewards is proposed to regulate adaptive reasoning depth, visual grounding, and hierarchical geographical accuracy.
  • The Geo-ADAPT framework learns an adaptive reasoning policy and reports state-of-the-art results on multiple geo-localization benchmarks while substantially reducing hallucinations.
  • The work addresses limitations of retrieval-based and fixed-depth reasoning approaches, enabling more efficient and accurate image geo-localization using vision-language models.

Abstract

The emergence of Vision-Language Models (VLMs) has introduced new paradigms for global image geo-localization through retrieval-augmented generation (RAG) and reasoning-driven inference. However, RAG methods are constrained by retrieval database quality, while reasoning-driven approaches fail to internalize image locatability, relying on inefficient, fixed-depth reasoning paths that increase hallucinations and degrade accuracy. To overcome these limitations, we introduce an Optimized Locatability Score that quantifies an image's suitability for deep reasoning in geo-localization. Using this metric, we curate Geo-ADAPT-51K, a locatability-stratified reasoning dataset enriched with augmented reasoning trajectories for complex visual scenes. Building on this foundation, we propose a two-stage Group Relative Policy Optimization (GRPO) curriculum with customized reward functions that regulate adaptive reasoning depth, visual grounding, and hierarchical geographical accuracy. Our framework, Geo-ADAPT, learns an adaptive reasoning policy, achieves state-of-the-art performance across multiple geo-localization benchmarks, and substantially reduces hallucinations by reasoning both adaptively and efficiently.