Remote SAMsing: From Segment Anything to Segment Everything

arXiv cs.AI / 5/4/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper identifies two key challenges when applying SAM2 to large remote-sensing scenes: a quality–coverage trade-off from fixed thresholds and object fragmentation caused by tiling large images.
  • Remote SAMsing is introduced as an open-source pipeline that improves coverage and spatial consistency without modifying SAM2 or requiring any training data.
  • It uses a multi-pass tile inference strategy that iteratively refines segmentation, adaptively relaxing quality thresholds only when additional coverage gains stall, to preserve precise masks first.
  • A contextual padding plus parameter-free best-match merge step is used to reconnect objects split across tile boundaries and restore spatial consistency.
  • Experiments on seven remote-sensing scenes show coverage increases from 30–68% with single-pass SAM2 to 91–98%, with strong per-class detection (e.g., buildings 95%, cars 82–93% Det@0.5) and results that generalize to false-color imagery and very large mosaics (e.g., a 1.94B-pixel Potsdam mosaic with 97% coverage).

Abstract

SAM2 produces high-quality zero-shot segmentation on natural images, but applying it to large remote sensing scenes exposes two problems: (1) its mask generator faces an inherent quality-coverage trade-off: strict thresholds yield precise masks but leave most of the image unsegmented, while relaxed thresholds increase coverage at the cost of mask quality; and (2) large images must be tiled, fragmenting objects across tile boundaries. We propose Remote SAMsing, an open-source pipeline that solves both problems without modifying SAM2 or requiring training data. For coverage, a multi-pass algorithm runs SAM2 repeatedly on each tile, painting accepted masks black between passes to simplify the scene for the next iteration, and relaxing quality thresholds only when coverage gains stagnate, ensuring that the most precise masks are always captured first. For spatial consistency, contextual padding and a parameter-free best-match merge reconstruct objects fragmented across tile boundaries. Evaluated on seven scenes (5~cm to 4.78~m GSD), the pipeline raises coverage from 30--68\% (single-pass SAM2) to 91--98\%. Ablation experiments quantify the contribution of each component to coverage and detection quality. Per-class evaluation shows that SAM2 transfers well to discrete RS objects (buildings 95\%, cars 82--93\% Det@0.5) with segment boundaries 3--8\times more precise than SLIC and Felzenszwalb baselines. Tile size functions as an implicit scale parameter: reducing it from 1{,}000 to 250 raises Det@0.5 from 56\% to 85\%, outperforming SAM2's built-in multi-scale mechanism. The pipeline generalizes to MNF false-color imagery without retraining (99.5\% ASA) and scales to production-sized images: a 1.94 billion pixel Potsdam mosaic achieved 97\% coverage without quality degradation.