RAFT-MSF++: Temporal Geometry-Motion Feature Fusion for Self-Supervised Monocular Scene Flow
arXiv cs.CV / 4/22/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces RAFT-MSF++, a self-supervised multi-frame monocular scene flow method designed to overcome the common limitation of using only two-frame inputs.
- It recurrently fuses temporal features to jointly estimate depth and 3D scene flow, improving temporal modeling and robustness.
- A key component is the Geometry-Motion Feature (GMF), which compactly encodes coupled geometry and motion cues and is iteratively updated for temporal reasoning.
- To handle occlusions, the approach uses relative positional attention to inject spatial priors and an occlusion regularization module to propagate reliable motion information from visible regions.
- Experiments on the KITTI Scene Flow benchmark report 24.14% SF-all, a 30.99% improvement over the baseline, along with stronger performance in occluded areas, and the code is released on GitHub.
![AI TikTok Marketing for Pet Brands [2026 Guide]](/_next/image?url=https%3A%2F%2Fmedia2.dev.to%2Fdynamic%2Fimage%2Fwidth%3D1200%2Cheight%3D627%2Cfit%3Dcover%2Cgravity%3Dauto%2Cformat%3Dauto%2Fhttps%253A%252F%252Fdev-to-uploads.s3.amazonaws.com%252Fuploads%252Farticles%252Fj35r9qm34d68qf2gq7no.png&w=3840&q=75)


