Geometric Context Transformer for Streaming 3D Reconstruction

arXiv cs.CV / 4/16/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • The paper introduces LingBot-Map, a feed-forward 3D foundation model for streaming 3D reconstruction that uses a geometric context transformer (GCT) architecture inspired by SLAM principles.
  • Its attention mechanism combines anchor context, a pose-reference window, and trajectory memory to improve coordinate grounding, leverage dense geometric cues, and correct long-range drift.
  • The method is designed to keep the streaming state compact while maintaining rich geometric information for stable, efficient inference.
  • Reported performance targets around 20 FPS at 518×378 input resolution and supports long sequences exceeding 10,000 frames.
  • Experiments on multiple benchmarks show the approach outperforms prior streaming methods and iterative optimization-based approaches.

Abstract

Streaming 3D reconstruction aims to recover 3D information, such as camera poses and point clouds, from a video stream, which necessitates geometric accuracy, temporal consistency, and computational efficiency. Motivated by the principles of Simultaneous Localization and Mapping (SLAM), we introduce LingBot-Map, a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture. A defining aspect of LingBot-Map lies in its carefully designed attention mechanism, which integrates an anchor context, a pose-reference window, and a trajectory memory to address coordinate grounding, dense geometric cues, and long-range drift correction, respectively. This design keeps the streaming state compact while retaining rich geometric context, enabling stable efficient inference at around 20 FPS on 518 x 378 resolution inputs over long sequences exceeding 10,000 frames. Extensive evaluations across a variety of benchmarks demonstrate that our approach achieves superior performance compared to both existing streaming and iterative optimization-based approaches.