PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery
arXiv cs.CV / 3/19/2026
📰 NewsModels & Research
Key Points
- PanoVGGT is a permutation-equivariant Transformer that jointly predicts camera poses, depth maps, and 3D point clouds from one or more panoramas in a single forward pass.
- It uses spherical-aware positional embeddings and a panorama-specific three-axis SO(3) rotation augmentation to enable robust geometric reasoning in the spherical domain.
- To resolve global-frame ambiguity, the method employs a stochastic anchoring strategy during training.
- The work introduces PanoCity, a large outdoor panoramic dataset with dense depth and 6-DoF pose annotations, and reports competitive accuracy and cross-domain generalization with code and data to be released.
Related Articles

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA
QwenDean-4B | fine-tuned SLM for UIGen; our first attempt, looking for feedback!
Reddit r/LocalLLaMA
acestep.cpp: portable C++17 implementation of ACE-Step 1.5 music generation using GGML. Runs on CPU, CUDA, ROCm, Metal, Vulkan
Reddit r/LocalLLaMA

**Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding**
Hugging Face Blog

Newest GPU server in the lab! 72gb ampere vram!
Reddit r/LocalLLaMA