PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery

arXiv cs.CV / 3/19/2026

📰 NewsModels & Research

共有:

Key Points

PanoVGGT is a permutation-equivariant Transformer that jointly predicts camera poses, depth maps, and 3D point clouds from one or more panoramas in a single forward pass.
It uses spherical-aware positional embeddings and a panorama-specific three-axis SO(3) rotation augmentation to enable robust geometric reasoning in the spherical domain.
To resolve global-frame ambiguity, the method employs a stochastic anchoring strategy during training.
The work introduces PanoCity, a large outdoor panoramic dataset with dense depth and 6-DoF pose annotations, and reports competitive accuracy and cross-domain generalization with code and data to be released.

Abstract

Panoramic imagery offers a full 360{\deg} field of view and is increasingly common in consumer devices. However, it introduces non-pinhole distortions that challenge joint pose estimation and 3D reconstruction. Existing feed-forward models, built for perspective cameras, generalize poorly to this setting. We propose PanoVGGT, a permutation-equivariant Transformer framework that jointly predicts camera poses, depth maps, and 3D point clouds from one or multiple panoramas in a single forward pass. The model incorporates spherical-aware positional embeddings and a panorama-specific three-axis SO(3) rotation augmentation, enabling effective geometric reasoning in the spherical domain. To resolve inherent global-frame ambiguity, we further introduce a stochastic anchoring strategy during training. In addition, we contribute PanoCity, a large-scale outdoor panoramic dataset with dense depth and 6-DoF pose annotations. Extensive experiments on PanoCity and standard benchmarks demonstrate that PanoVGGT achieves competitive accuracy, strong robustness, and improved cross-domain generalization. Code and dataset will be released.

Two bots, one confused server: what Nimbus revealed about AI agent identity

Dev.to

PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark forFinance

Dev.to

A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research

MarkTechPost

DNA Memory: Making AI Agents Learn, Forget, and Evolve Like a Human Brain

Dev.to

Tinybox- offline AI device 120B parameters

Hacker News

PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery

Key Points

Abstract

Related Articles

Two bots, one confused server: what Nimbus revealed about AI agent identity

PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark forFinance

A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research

DNA Memory: Making AI Agents Learn, Forget, and Evolve Like a Human Brain

Tinybox- offline AI device 120B parameters

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer