PiLoT: Neural Pixel-to-3D Registration for UAV-based Ego and Target Geo-localization

arXiv cs.CV / 3/24/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • PiLoT is a unified UAV localization framework that directly registers a live video stream to a geo-referenced 3D map to estimate both ego-pose and target geo-localization, reducing reliance on GNSS and separate active sensors.
  • The system improves real-time performance and accuracy using a Dual-Thread Engine that separates map rendering from the localization core to keep latency low while avoiding drift.
  • It introduces a large synthetic training dataset with precise geometric annotations (camera pose and depth maps) to train a lightweight network that generalizes in a zero-shot way from simulation to real-world data.
  • A Joint Neural-Guided Stochastic-Gradient Optimizer (JNGO) is proposed to maintain robust convergence under aggressive UAV motion.
  • Experiments on multiple public and newly collected benchmarks report state-of-the-art performance while achieving over 25 FPS on an NVIDIA Jetson Orin, with code and dataset released on GitHub.

Abstract

We present PiLoT, a unified framework that tackles UAV-based ego and target geo-localization. Conventional approaches rely on decoupled pipelines that fuse GNSS and Visual-Inertial Odometry (VIO) for ego-pose estimation, and active sensors like laser rangefinders for target localization. However, these methods are susceptible to failure in GNSS-denied environments and incur substantial hardware costs and complexity. PiLoT breaks this paradigm by directly registering live video stream against a geo-referenced 3D map. To achieve robust, accurate, and real-time performance, we introduce three key contributions: 1) a Dual-Thread Engine that decouples map rendering from core localization thread, ensuring both low latency while maintaining drift-free accuracy; 2) a large-scale synthetic dataset with precise geometric annotations (camera pose, depth maps). This dataset enables the training of a lightweight network that generalizes in a zero-shot manner from simulation to real data; and 3) a Joint Neural-Guided Stochastic-Gradient Optimizer (JNGO) that achieves robust convergence even under aggressive motion. Evaluations on a comprehensive set of public and newly collected benchmarks show that PiLoT outperforms state-of-the-art methods while running over 25 FPS on NVIDIA Jetson Orin platform. Our code and dataset is available at: https://github.com/Choyaa/PiLoT.