AI Navigate

FILT3R: Latent State Adaptive Kalman Filter for Streaming 3D Reconstruction

arXiv cs.CV / 3/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • FILT3R introduces a training-free latent filtering layer that treats recurrent state updates as stochastic state estimation in token space.
  • It maintains per-token variance and computes a Kalman-style gain to adaptively balance memory retention against new observations, with process noise online-estimated from EMA-normalized temporal drift of candidate tokens.
  • The approach yields an interpretable update rule that generalizes common overwrite and gating policies as special cases, with gains shrinking in stable regimes and rising when genuine scene changes increase uncertainty.
  • It improves long-horizon stability for depth, pose, and 3D reconstruction in streaming settings, and code will be released on GitHub for easy integration as a plug-in.

Abstract

Streaming 3D reconstruction maintains a persistent latent state that is updated online from incoming frames, enabling constant-memory inference. A key failure mode is the state update rule: aggressive overwrites forget useful history, while conservative updates fail to track new evidence, and both behaviors become unstable beyond the training horizon. To address this challenge, we propose FILT3R, a training-free latent filtering layer that casts recurrent state updates as stochastic state estimation in token space. FILT3R maintains a per-token variance and computes a Kalman-style gain that adaptively balances memory retention against new observations. Process noise -- governing how much the latent state is expected to change between frames -- is estimated online from EMA-normalized temporal drift of candidate tokens. Using extensive experiments, we demonstrate that FILT3R yields an interpretable, plug-in update rule that generalizes common overwrite and gating policies as special cases. Specifically, we show that gains shrink in stable regimes as uncertainty contracts with accumulated evidence, and rise when genuine scene change increases process uncertainty, improving long-horizon stability for depth, pose, and 3D reconstruction, compared to the existing methods. Code will be released at https://github.com/jinotter3/FILT3R.