AI Navigate

HFP-SAM: Hierarchical Frequency Prompted SAM for Efficient Marine Animal Segmentation

arXiv cs.CV / 3/16/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper presents HFP-SAM, a hierarchical framework for marine animal segmentation that leverages the Segment Anything Model (SAM) for enhanced segmentation in complex marine environments.
  • It introduces a Frequency Guided Adapter to inject marine-scene information into the frozen SAM backbone via frequency-domain prior masks.
  • It proposes a Frequency-aware Point Selection module to generate highlighted regions through frequency analysis and feed those points as prompts to SAM's decoder for refined predictions.
  • It adds a Full-View Mamba module to efficiently extract spatial and channel contextual information with linear computational complexity to produce comprehensive segmentation masks.
  • The authors report superior performance on four public datasets and provide public source code at the linked repository.

Abstract

Marine Animal Segmentation (MAS) aims at identifying and segmenting marine animals from complex marine environments. Most of previous deep learning-based MAS methods struggle with the long-distance modeling issue. Recently, Segment Anything Model (SAM) has gained popularity in general image segmentation. However, it lacks of perceiving fine-grained details and frequency information. To this end, we propose a novel learning framework, named Hierarchical Frequency Prompted SAM (HFP-SAM) for high-performance MAS. First, we design a Frequency Guided Adapter (FGA) to efficiently inject marine scene information into the frozen SAM backbone through frequency domain prior masks. Additionally, we introduce a Frequency-aware Point Selection (FPS) to generate highlighted regions through frequency analysis. These regions are combined with the coarse predictions of SAM to generate point prompts and integrate into SAM's decoder for fine predictions. Finally, to obtain comprehensive segmentation masks, we introduce a Full-View Mamba (FVM) to efficiently extract spatial and channel contextual information with linear computational complexity. Extensive experiments on four public datasets demonstrate the superior performance of our approach. The source code is publicly available at https://github.com/Drchip61/TIP-HFP-SAM.