SCC-Loc: A Unified Semantic Cascade Consensus Framework for UAV Thermal Geo-Localization

arXiv cs.CV / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces SCC-Loc, a unified semantic-cascade-consensus framework for UAV thermal geo-localization in GNSS-denied environments despite thermal-visible modality gaps.
  • It uses a shared DINOv2 backbone for both global retrieval and MINIMA_RoMa matching to support zero-shot absolute position estimation while reducing memory footprint.
  • SCC-Loc addresses ambiguity and registration errors with three components: Semantic-Guided Viewport Alignment (SGVA), Cascaded Spatial-Adaptive Texture-Structure Filtering (C-SATSF) for geometric consistency, and Consensus-Driven Reliability-Aware Position Selection (CD-RAPS) for pose optimization.
  • To mitigate data scarcity, the authors build the Thermal-UAV dataset with 11,890 thermal queries aligned to large-scale satellite ortho-photos and corresponding DSM.
  • Experiments report a new state of the art, reducing mean localization error to 9.37 m and achieving a 7.6× improvement within a strict 5 m threshold versus the strongest baseline, with code and data released on GitHub.

Abstract

Cross-modal Thermal Geo-localization (TG) provides a robust, all-weather solution for Unmanned Aerial Vehicles (UAVs) in Global Navigation Satellite System (GNSS)-denied environments. However, profound thermal-visible modality gaps introduce severe feature ambiguity, systematically corrupting conventional coarse-to-fine registration. To dismantle this bottleneck, we propose SCC-Loc, a unified Semantic-Cascade-Consensus localization framework. By sharing a single DINOv2 backbone across global retrieval and MINIMA_{\text{RoMa}} matching, it minimizes memory footprint and achieves zero-shot, highly accurate absolute position estimation. Specifically, we tackle modality ambiguity by introducing three cohesive components. First, we design the Semantic-Guided Viewport Alignment (SGVA) module to adaptively optimize satellite crop regions, effectively correcting initial spatial deviations. Second, we develop the Cascaded Spatial-Adaptive Texture-Structure Filtering (C-SATSF) mechanism to explicitly enforce geometric consistency, thereby eradicating dense cross-modal outliers. Finally, we propose the Consensus-Driven Reliability-Aware Position Selection (CD-RAPS) strategy to derive the optimal solution through a synergy of physically constrained pose optimization. To address data scarcity, we construct Thermal-UAV, a comprehensive dataset providing 11,890 diverse thermal queries referenced against a large-scale satellite ortho-photo and corresponding spatially aligned Digital Surface Model (DSM). Extensive experiments demonstrate that SCC-Loc establishes a new state-of-the-art, suppressing the mean localization error to 9.37 m and providing a 7.6-fold accuracy improvement within a strict 5-m threshold over the strongest baseline. Code and dataset are available at https://github.com/FloralHercules/SCC-Loc.