Sky2Ground: A Benchmark for Site Modeling under Varying Altitude
arXiv cs.CV / 3/17/2026
📰 NewsModels & Research
Key Points
- Sky2Ground is a three-view dataset designed for varying altitude camera localization, correspondence learning, and reconstruction, combining synthetic imagery with real-world images across 51 sites to enable evaluation from global to local contexts.
- The work highlights challenges such as satellite imagery degrading pose estimation performance under large altitude variations and reconstruction difficulties due to sparse geometric overlap and noise.
- It benchmarks state-of-the-art pose estimation models (MASt3R, DUSt3R, Map Anything, VGGT) and introduces SkyNet with a curriculum-based training strategy to improve cross-view consistency, achieving 9.6% gains on RRA@5 and 18.1% on RTA@5.
- Sky2Ground and SkyNet provide a new testbed and baseline for large-scale, multi-altitude 3D perception and camera localization, with code and models to be released publicly.
- The dataset spans 51 sites with thousands of satellite, aerial, and ground images across wide altitude ranges and near-orthogonal viewing angles, enabling rigorous evaluation across global-to-local contexts.
Related Articles

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA
QwenDean-4B | fine-tuned SLM for UIGen; our first attempt, looking for feedback!
Reddit r/LocalLLaMA
acestep.cpp: portable C++17 implementation of ACE-Step 1.5 music generation using GGML. Runs on CPU, CUDA, ROCm, Metal, Vulkan
Reddit r/LocalLLaMA

**Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding**
Hugging Face Blog

Newest GPU server in the lab! 72gb ampere vram!
Reddit r/LocalLLaMA