GeoSearch: Augmenting Worldwide Geolocalization with Web-Scale Reverse Image Search and Image Matching

arXiv cs.CV / 4/29/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper proposes GeoSearch, an open-world image geolocation framework that predicts GPS coordinates for images globally, addressing limitations of fixed-reference visual databases.
  • GeoSearch integrates web-scale reverse image search into a retrieval-augmented generation (RAG) pipeline by injecting both candidate coordinates and textual evidence from web pages into large multimodal model prompts.
  • To reduce irrelevant or noisy web content, it uses a two-stage filtering strategy: first image matching, then confidence-based gating.
  • Experiments on Im2GPS3k and YFCC4k show that GeoSearch outperforms prior methods in leakage-aware evaluations.
  • The authors release code and data to enable reproducibility and further research.

Abstract

Worldwide image geolocalization, which aims to predict the GPS coordinates of any image on Earth, remains challenging due to global visual diversity. Recent generative approaches based on Retrieval-Augmented Generation (RAG) and Large Multimodal Models (LMMs) leverage candidates retrieved from fixed databases for reasoning, but often struggle with scenes that are absent from the reference set. In this work, we propose GeoSearch, an open-world geolocation framework that integrates web-scale reverse image search into the RAG pipeline. GeoSearch augments LMM prompts with database-retrieved coordinates and textual evidence extracted from web pages. To mitigate noise from irrelevant content, we introduce a two-layer filtering mechanism consisting of image matching, followed by confidence-based gating. Experiments on standard benchmarks Im2GPS3k and YFCC4k demonstrate the superiority of GeoSearch under leakage-aware evaluation. Our code and data are publicly available to support reproducibility.