HyVGGT-VO: Tightly Coupled Hybrid Dense Visual Odometry with Feed-Forward Models

arXiv cs.RO / 4/3/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • HyVGGT-VOは、フィードフォワード型の密な地図/再構成能力と、従来のスパースVOの計算効率・高頻度姿勢推定を両立する新しい密視覚オドメトリ(VO)フレームワークを提案しています。
  • 伝統的な光学フローとVGGTのトラッキングヘッドを状況に応じて切り替える「適応的なハイブリッド追跡フロントエンド」により、頑健性を保ちながら密な処理の負荷を抑える設計です。
  • 姿勢推定とVGGT予測のスケールを同時に更新する階層的最適化により、グローバルなスケール整合性を高めることを狙っています。
  • 室内のEuRoCデータセットで平均軌跡誤差を85%低減、屋外のKITTIで12%改善し、既存のVGGTベース手法に対して約5倍の処理速度向上を報告しています。
  • 受理後にコード公開予定で、密SLAMのリアルタイム性課題に対する実装可能な改善策として注目されます。

Abstract

Dense visual odometry (VO), which provides pose estimation and dense 3D reconstruction, serves as the cornerstone for applications ranging from robotics to augmented reality. Recently, feed-forward models have demonstrated remarkable capabilities in dense mapping. However, when these models are used in dense visual SLAM systems, their heavy computational burden restricts them to yielding sparse pose outputs at keyframes while still failing to achieve real-time pose estimation. In contrast, traditional sparse methods provide high computational efficiency and high-frequency pose outputs, but lack the capability for dense reconstruction. To address these limitations, we propose HyVGGT-VO, a novel framework that combines the computational efficiency of sparse VO with the dense reconstruction capabilities of feed-forward models. To the best of our knowledge, this is the first work to tightly couple a traditional VO framework with VGGT, a state-of-the-art feed-forward model. Specifically, we design an adaptive hybrid tracking frontend that dynamically switches between traditional optical flow and the VGGT tracking head to ensure robustness. Furthermore, we introduce a hierarchical optimization framework that jointly refines VO poses and the scale of VGGT predictions to ensure global scale consistency. Our approach achieves an approximately 5x processing speedup compared to existing VGGT-based methods, while reducing the average trajectory error by 85% on the indoor EuRoC dataset and 12% on the outdoor KITTI benchmark. Our code will be publicly available upon acceptance. Project page: https://geneta2580.github.io/HyVGGT-VO.io.