Breaking the Resource Wall: Geometry-Guided Sequence Modeling for Efficient Semantic Segmentation

arXiv cs.CV / 4/28/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper introduces DGM-Net (Directional Geometric Mamba Network), a geometry-guided semantic segmentation model designed to improve accuracy without scaling up backbone size or computation budgets.
  • It proposes Directional Geometric Mamba (G-Mamba), a linear-complexity O(N) sequence/context modeling operator intended as an efficient alternative to modules like ASPP and PPM.
  • To strengthen structural awareness in state space model (SSM)-based processing, the authors develop a DGM-Module that derives centripetal flow fields and topological skeletons to guide scanning and better preserve object boundaries.
  • The method reportedly achieves strong segmentation performance—80.8% mIoU on the reported setting within 28k iterations, 82.3% mIoU on Cityscapes test, and 45.24% mIoU on ADE20K—while remaining stable on constrained hardware (e.g., batch size 2 on 8GB VRAM).
  • Overall, the work argues that integrating geometric guidance into SSM-based architectures can yield resource-efficient, high-quality semantic segmentation results.

Abstract

High-performance semantic segmentation has achieved significant progress in recent years, often driven by increasingly large backbones and higher computational budgets. While effective, such approaches introduce substantial computational overhead and limit accessibility under constrained hardware settings. In this paper, we propose DGM-Net (Directional Geometric Mamba Network), an efficient architecture that improves modeling capability through structural design rather than increasing model capacity. We introduce Directional Geometric Mamba (G-Mamba), a linear-complexity O(N) operator as an alternative to conventional context modeling modules such as ASPP and PPM. To further enhance structural awareness in state space model (SSM)-based modeling, we design the DGM-Module, which extracts centripetal flow fields and topological skeletons to guide the scanning process and improve boundary preservation. Without relying on large-scale pretraining or heavy backbone scaling, DGM-Net achieves 80.8% mIoU within 28k iterations, 82.3% mIoU on Cityscapes test set, and 45.24% mIoU on ADE20K. In addition, the model maintains stable performance under constrained hardware settings (e.g., batch size of 2 on 8GB VRAM), highlighting its efficiency and practicality. These results demonstrate that incorporating geometric guidance into SSM-based architectures provides an effective and resource-efficient direction for semantic segmentation.