PanoSAMic: Panoramic Image Segmentation from SAM Feature Encoding and Dual View Fusion
arXiv cs.CV / 4/27/2026
💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that existing image foundation models underperform on spherical panoramic images because they are largely trained on perspective imagery.
- PanoSAMic reuses the pre-trained Segment Anything (SAM) encoder by modifying it to output multi-stage features for semantic segmentation in panoramic settings.
- It introduces a spatio-modal fusion module that dynamically selects relevant modalities and features per region, improving robustness across different input types.
- For panoramic-specific challenges like distortions and edge discontinuities, the model’s decoder uses spherical attention and dual-view fusion.
- The authors report state-of-the-art results on Stanford2D3DS (for RGB, RGB-D, and RGB-D-N) and strong performance on Matterport3D (for RGB and RGB-D), and provide an implementation link.




