Mem3R: Streaming 3D Reconstruction with Hybrid Memory via Test-Time Training

arXiv cs.CV / 4/9/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Mem3R is a streaming 3D reconstruction model designed for long video sequences in robotics and augmented reality, aiming to reduce drift and temporal forgetting common in recurrent/state-compressed approaches.
It uses a hybrid memory architecture that decouples camera tracking from geometric mapping: camera tracking relies on an implicit fast-weight memory updated via test-time training, while mapping uses an explicit fixed-size token state.
Compared with CUT3R, Mem3R improves long-sequence performance and reduces parameter count from 793M to 644M while supporting CUT3R-compatible plug-and-play state update strategies.
When integrated with TTT3R, the system cuts Absolute Trajectory Error by up to 39% on 500–1000 frame sequences and maintains constant GPU memory usage with similar inference throughput.
The reported gains also transfer to downstream tasks such as video depth estimation and 3D reconstruction.

Abstract

Streaming 3D perception is well suited to robotics and augmented reality, where long visual streams must be processed efficiently and consistently. Recent recurrent models offer a promising solution by maintaining fixed-size states and enabling linear-time inference, but they often suffer from drift accumulation and temporal forgetting over long sequences due to the limited capacity of compressed latent memories. We propose Mem3R, a streaming 3D reconstruction model with a hybrid memory design that decouples camera tracking from geometric mapping to improve temporal consistency over long sequences. For camera tracking, Mem3R employs an implicit fast-weight memory implemented as a lightweight Multi-Layer Perceptron updated via Test-Time Training. For geometric mapping, Mem3R maintains an explicit token-based fixed-size state. Compared with CUT3R, this design not only significantly improves long-sequence performance but also reduces the model size from 793M to 644M parameters. Mem3R supports existing improved plug-and-play state update strategies developed for CUT3R. Specifically, integrating it with TTT3R decreases Absolute Trajectory Error by up to 39% over the base implementation on 500 to 1000 frame sequences. The resulting improvements also extend to other downstream tasks, including video depth estimation and 3D reconstruction, while preserving constant GPU memory usage and comparable inference throughput. Project page: https://lck666666.github.io/Mem3R/

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/9DailyView insight →

Black Hat Asia

AI Business

OpenAI's pricing is about to change — here's why local AI matters more than ever

Dev.to

Google AI Tells Users to Put Glue on Their Pizza!

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Could it be that this take is not too far fetched?

Reddit r/LocalLLaMA

Mem3R: Streaming 3D Reconstruction with Hybrid Memory via Test-Time Training

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat Asia

OpenAI's pricing is about to change — here's why local AI matters more than ever

Google AI Tells Users to Put Glue on Their Pizza!

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Could it be that this take is not too far fetched?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer