Modality-Specific Hierarchical Enhancement for RGB-D Camouflaged Object Detection

arXiv cs.CV / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • RGB-D camouflaged object detection is difficult because targets closely resemble backgrounds, and existing methods often fuse RGB and depth features without enough modality-specific enhancement.
  • The paper introduces MHENet, which adds a Texture Hierarchical Enhancement Module (THEM) to boost subtle high-frequency texture cues and a Geometry Hierarchical Enhancement Module (GHEM) to strengthen geometric structure via learnable gradient extraction.
  • MHENet uses an Adaptive Dynamic Fusion Module (ADFM) that fuses the enhanced texture and geometry representations using spatially varying weights to improve cross-modal fusion quality.
  • Experiments on four benchmarks show MHENet outperforms 16 state-of-the-art methods both qualitatively and quantitatively, and the code is released on GitHub.

Abstract

Camouflaged object detection (COD) is challenging due to high target-background similarity, and recent methods address this by complementarily using RGB-D texture and geometry cues. However, RGB-D COD methods still underutilize modality-specific cues, which limits fusion quality. We believe this is because RGB and depth features are fused directly after backbone extraction without modality-specific enhancement. To address this limitation, we propose MHENet, an RGB-D COD framework that performs modality-specific hierarchical enhancement and adaptive fusion of RGB and depth features. Specifically, we introduce a Texture Hierarchical Enhancement Module (THEM) to amplify subtle texture variations by extracting high-frequency information and a Geometry Hierarchical Enhancement Module (GHEM) to enhance geometric structures via learnable gradient extraction, while preserving cross-scale semantic consistency. Finally, an Adaptive Dynamic Fusion Module (ADFM) adaptively fuses the enhanced texture and geometry features with spatially varying weights. Experiments on four benchmarks demonstrate that MHENet surpasses 16 state-of-the-art methods qualitatively and quantitatively. Code is available at https://github.com/afdsgh/MHENet.