3D-LENS: A 3D Lifting-based Elevated Novel-view Synthesis method for Single-View Aerial-Ground Re-Identification

arXiv cs.CV / 4/30/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper targets the viewpoint-domain gap in aerial-ground re-identification, which makes cross-view retrieval difficult due to occlusion and distortion of discriminative features.
  • It formalizes a harder Single-View AG-ReID (SV AG-ReID) setting where models trained on one real viewpoint must generalize to an unseen viewpoint without target-domain training data.
  • The proposed 3D-LENS framework combines geometrically consistent elevated novel-view synthesis using large-scale 3D mesh reconstruction with representation learning designed to reduce synthetic-to-real bias.
  • The authors claim improved view-consistent synthesis over 2D generative baselines and over prior template-based 3D methods, including better handling of fine-grained details like carried objects.
  • Experiments reportedly achieve state-of-the-art results on SV AG-ReID, with code and data planned for release on GitHub.

Abstract

Aerial-Ground Re-Identification (AG-ReID) is constrained by the viewpoint-domain gap, as drastic viewpoint disparities occlude or distort discriminative features, making cross-viewpoint image retrieval challenging. While existing methods rely on paired cross-view annotations, real-world deployments, such as wilderness search-and-rescue (SAR), often lack target-domain data, requiring retrieval from ground-level references alone. To our knowledge, we are the first to address this challenge by formalizing the Single-View AG-ReID (SV AG-ReID) setting, where models trained on a single real viewpoint must generalize to an unseen viewpoint. We propose 3D Lifting-based Elevated Novel-view Synthesis (3D-LENS), a unified framework combining geometrically-consistent novel view synthesis that leverages large-scale 3D mesh reconstruction, with a robust representation learning scheme to mitigate synthetic-to-real bias. Unlike 2D generative baselines that suffer from geometric inconsistencies or prior 3D methods that are restricted to class-specific templates, our approach ensures view-consistent synthesis across diverse categories without predefined templates that fail to capture fine-grained details, such as carried objects. Extensive experiments demonstrate that our method achieves state-of-the-art performance on SV AG-ReID scenarios. Code and data will be released at https://github.com/TurtleSmoke/3D-LENS.

3D-LENS: A 3D Lifting-based Elevated Novel-view Synthesis method for Single-View Aerial-Ground Re-Identification | AI Navigate