Accelerating Transformer-Based Monocular SLAM via Geometric Utility Scoring

arXiv cs.AI / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper targets inefficiency in Geometric Foundation Model (GFM)-based monocular SLAM, where systems still run costly dense geometric decoding before deciding a frame’s usefulness.
  • It introduces LeanGate, a lightweight feed-forward frame-gating network that predicts a frame’s geometric utility score before the heavy GFM feature extraction and matching.
  • LeanGate is designed as a predictive, plug-and-play module that bypasses over 90% of redundant frames via early rejection.
  • Experiments on standard SLAM benchmarks report more than 85% reduction in tracking FLOPs and about a 5x increase in end-to-end throughput.
  • The approach reportedly preserves tracking and mapping accuracy compared with dense baseline methods, suggesting the speed gains do not come at a major performance cost.

Abstract

Geometric Foundation Models (GFMs) have recently advanced monocular SLAM by providing robust, calibration-free 3D priors. However, deploying these models on dense video streams introduces significant computational redundancy. Current GFM-based SLAM systems typically rely on post hoc keyframe selection. Because of this, they must perform expensive dense geometric decoding simply to determine whether a frame contains novel geometry, resulting in late rejection and wasted computation. To mitigate this inefficiency, we propose LeanGate, a lightweight feed-forward frame-gating network. LeanGate predicts a geometric utility score to assess a frame's mapping value prior to the heavy GFM feature extraction and matching stages. As a predictive plug-and-play module, our approach bypasses over 90% of redundant frames. Evaluations on standard SLAM benchmarks demonstrate that LeanGate reduces tracking FLOPs by more than 85% and achieves a 5x end-to-end throughput speedup. Furthermore, it maintains the tracking and mapping accuracy of dense baselines.