Language-Guided Structure-Aware Network for Camouflaged Object Detection

arXiv cs.CV / 3/26/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper tackles camouflaged object detection (COD), where objects blend into the background by color, texture, and structure, making segmentation especially difficult.
  • It proposes a Language-Guided Structure-Aware Network (LGSAN) that uses CLIP with text prompts to generate guidance masks, steering a visual backbone (PVT-v2) toward likely camouflaged regions.
  • LGSAN improves visual feature quality by adding a Fourier Edge Enhancement Module (FEEM) to emphasize high-frequency edge information in the frequency domain.
  • It further refines structure and boundaries via a Structure-Aware Attention Module (SAAM) and a Coarse-Guided Local Refinement Module (CGLRM) for finer reconstruction.
  • Experiments on multiple COD datasets show competitive performance, supporting the method’s effectiveness and robustness.

Abstract

Camouflaged Object Detection (COD) aims to segment objects that are highly integrated with the background in terms of color, texture, and structure, making it a highly challenging task in computer vision. Although existing methods introduce multi-scale fusion and attention mechanisms to alleviate the above issues, they generally lack the guidance of textual semantic priors, which limits the model's ability to focus on camouflaged regions in complex scenes. To address this issue, this paper proposes a Language-Guided Structure-Aware Network (LGSAN). Specifically, based on the visual backbone PVT-v2, we introduce CLIP to generate masks from text prompts and RGB images, thereby guiding the multi-scale features extracted by PVT-v2 to focus on potential target regions. On this foundation, we further design a Fourier Edge Enhancement Module (FEEM), which integrates multi-scale features with high-frequency information in the frequency domain to extract edge enhancement features. Furthermore, we propose a Structure-Aware Attention Module (SAAM) to effectively enhance the model's perception of object structures and boundaries. Finally, we introduce a Coarse-Guided Local Refinement Module (CGLRM) to enhance fine-grained reconstruction and boundary integrity of camouflaged object regions. Extensive experiments demonstrate that our method consistently achieves highly competitive performance across multiple COD datasets, validating its effectiveness and robustness.