StreetForward: Perceiving Dynamic Street with Feedforward Causal Attention

arXiv cs.CV / 3/23/2026

📰 NewsModels & Research

共有:

Key Points

StreetForward introduces a pose-free, tracker-free feedforward framework for dynamic street reconstruction in autonomous driving, enabling rapid scene reconstruction without per-scene optimization.
It augments the Visual Geometry Grounded Transformer with a temporal mask attention module to extract motion information from image sequences and produce motion-aware latent representations.
Static content and dynamic instances are represented using 3D Gaussian Splatting and jointly optimized through cross-frame rendering with spatio-temporal consistency, allowing per-pixel velocity estimation and high-fidelity novel view synthesis at new poses and times.
Trained on the Waymo Open Dataset, StreetForward demonstrates superior performance on novel view synthesis and depth estimation compared with existing methods and shows zero-shot generalization on CARLA and other datasets.

Abstract

Feedforward reconstruction is crucial for autonomous driving applications, where rapid scene reconstruction enables efficient utilization of large-scale driving datasets in closed-loop simulation and other downstream tasks, eliminating the need for time-consuming per-scene optimization. We present StreetForward, a pose-free and tracker-free feedforward framework for dynamic street reconstruction. Building upon the alternating attention mechanism from Visual Geometry Grounded Transformer (VGGT), we propose a simple yet effective temporal mask attention module that captures dynamic motion information from image sequences and produces motion-aware latent representations. Static content and dynamic instances are represented uniformly with 3D Gaussian Splatting, and are optimized jointly by cross-frame rendering with spatio-temporal consistency, allowing the model to infer per-pixel velocities and produce high-fidelity novel views at new poses and times. We train and evaluate our model on the Waymo Open Dataset, demonstrating superior performance on novel view synthesis and depth estimation compared to existing methods. Furthermore, zero-shot inference on CARLA and other datasets validates the generalization capability of our approach. More visualizations are available on our project page: https://streetforward.github.io.

[D] Matryoshka Representation Learning

Reddit r/MachineLearning

Two new Qwen3.5 “Neo” fine‑tunes focused on fast, efficient reasoning

Reddit r/LocalLLaMA

HKIC, Gobi Partners and HKU team up for fund backing university research start-ups

SCMP Tech

Yann LeCun’s New LeWorldModel (LeWM) Research Targets JEPA Collapse in Pixel-Based Predictive World Modeling

MarkTechPost

Streaming experts

Simon Willison's Blog

StreetForward: Perceiving Dynamic Street with Feedforward Causal Attention

Key Points

Abstract

Related Articles

[D] Matryoshka Representation Learning

Two new Qwen3.5 “Neo” fine‑tunes focused on fast, efficient reasoning

HKIC, Gobi Partners and HKU team up for fund backing university research start-ups

Yann LeCun’s New LeWorldModel (LeWM) Research Targets JEPA Collapse in Pixel-Based Predictive World Modeling

Streaming experts

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer