RAFT-MSF++: Temporal Geometry-Motion Feature Fusion for Self-Supervised Monocular Scene Flow

arXiv cs.CV / 4/22/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces RAFT-MSF++, a self-supervised multi-frame monocular scene flow method designed to overcome the common limitation of using only two-frame inputs.
It recurrently fuses temporal features to jointly estimate depth and 3D scene flow, improving temporal modeling and robustness.
A key component is the Geometry-Motion Feature (GMF), which compactly encodes coupled geometry and motion cues and is iteratively updated for temporal reasoning.
To handle occlusions, the approach uses relative positional attention to inject spatial priors and an occlusion regularization module to propagate reliable motion information from visible regions.
Experiments on the KITTI Scene Flow benchmark report 24.14% SF-all, a 30.99% improvement over the baseline, along with stronger performance in occluded areas, and the code is released on GitHub.

Abstract

Monocular scene flow estimation aims to recover dense 3D motion from image sequences, yet most existing methods are limited to two-frame inputs, restricting temporal modeling and robustness to occlusions. We propose RAFT-MSF++, a self-supervised multi-frame framework that recurrently fuses temporal features to jointly estimate depth and scene flow. Central to our approach is the Geometry-Motion Feature (GMF), which compactly encodes coupled motion and geometry cues and is iteratively updated for effective temporal reasoning. To ensure the robustness of this temporal fusion against occlusions, we incorporate relative positional attention to inject spatial priors and an occlusion regularization module to propagate reliable motion from visible regions. These components enable the GMF to effectively propagate information even in ambiguous areas. Extensive experiments show that RAFT-MSF++ achieves 24.14% SF-all on the KITTI Scene Flow benchmark, with a 30.99% improvement over the baseline and better robustness in occluded regions. The code is available at https://github.com/sunzunyi/RAFT-MSF-PlusPlus.