Semantic-Fast-SAM: Efficient Semantic Segmenter

arXiv cs.CV / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The paper introduces Semantic-Fast-SAM (SFS), a semantic segmentation framework that combines FastSAM (a CNN-based, faster SAM re-implementation) with a semantic labeling pipeline for real-time performance.
  • SFS achieves semantic segmentation maps with accuracy comparable to earlier SAM-based approaches while using a fraction of the compute and memory compared with transformer-based SAM pipelines.
  • Experiments on Cityscapes and ADE20K show mIoU around 70.33 and 48.01 respectively, with roughly 20x faster inference than SSA in a closed-set setting.
  • The method also supports open-vocabulary segmentation by using CLIP-based semantic heads, demonstrating improved broad class labeling performance for practical robotics use cases.
  • An implementation is provided on GitHub, enabling researchers and developers to experiment with real-time “segment-anything” style semantic segmentation.

Abstract

We propose Semantic-Fast-SAM (SFS), a semantic segmentation framework that combines the Fast Segment Anything model with a semantic labeling pipeline to achieve real-time performance without sacrificing accuracy. FastSAM is an efficient CNN-based re-implementation of the Segment Anything Model (SAM) that runs much faster than the original transformer-based SAM. Building upon FastSAM's rapid mask generation, we integrate a Semantic-Segment-Anything (SSA) labeling strategy to assign meaningful categories to each mask. The resulting SFS model produces high-quality semantic segmentation maps at a fraction of the computational cost and memory footprint of the original SAM-based approach. Experiments on Cityscapes and ADE20K benchmarks demonstrate that SFS matches the accuracy of prior SAM-based methods (mIoU ~ 70.33 on Cityscapes and 48.01 on ADE20K) while achieving approximately 20x faster inference than SSA in the closed-set setting. We also show that SFS effectively handles open-vocabulary segmentation by leveraging CLIP-based semantic heads, outperforming recent open-vocabulary models on broad class labeling. This work enables practical real-time semantic segmentation with the "segment-anything" capability, broadening the applicability of foundation segmentation models in robotics scenarios. The implementation is available at https://github.com/KBH00/Semantic-Fast-SAM.