GeoRouter: Dynamic Paradigm Routing for Worldwide Image Geolocalization

arXiv cs.CV / 3/26/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes GeoRouter, a dynamic framework for worldwide image geolocalization that adaptively routes each image query to either a retrieval-based or generation-based paradigm depending on expected performance.
  • It argues that retrieval models tend to excel at fine-grained instance matching, while generation models (using large vision-language models) are stronger at semantic reasoning, making a single approach insufficient for all cases.
  • GeoRouter uses an LVLM backbone to analyze visual content and produce routing decisions, and introduces a distance-aware preference objective that turns relative distance gaps between paradigms into continuous supervision.
  • The work also introduces GeoRouting, described as the first large-scale dataset designed specifically to train routing policies with independent predictions from both paradigms.
  • Experiments on IM2GPS3k and YFCC4k show GeoRouter significantly outperforming existing state-of-the-art baselines, supporting the effectiveness of paradigm heterogeneity and routing.

Abstract

Worldwide image geolocalization aims to predict precise GPS coordinates for images captured anywhere on Earth, which is challenging due to the large visual and geographic diversity. Recent methods mainly follow two paradigms: retrieval-based approaches that match queries against a reference database, and generation-based approaches that directly predict coordinates using Large Vision-Language Models (LVLMs). However, we observe distinct error profiles between them: retrieval excels at fine-grained instance matching, while generation offers robust semantic reasoning. This complementary heterogeneity suggests that no single paradigm is universally superior. To harness this potential, we propose GeoRouter, a dynamic routing framework that adaptively assigns each query to the optimal paradigm. GeoRouter leverages an LVLM backbone to analyze visual content and provide routing decisions. To optimize GeoRouter, we introduce a distance-aware preference objective that converts the distance gap between paradigms into a continuous supervision signal, explicitly reflecting relative performance differences. Furthermore, we construct GeoRouting, the first large-scale dataset tailored for training routing policies with independent paradigm predictions. Extensive experiments on IM2GPS3k and YFCC4k demonstrate that GeoRouter significantly outperforms state-of-the-art baselines.