High-Precision Dichotomous Image Segmentation via Depth Integrity-Prior and Fine-Grained Patch Strategy

arXiv cs.CV / 4/29/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses high-precision dichotomous image segmentation by highlighting a tradeoff: non-diffusion methods are fast but have weaker semantics and unstable spatial priors, while diffusion methods are accurate but computationally expensive.
  • It introduces a “depth integrity-prior,” observing that complete objects tend to form low-variance, smoothly connected regions with sharp boundaries in depth maps, whereas backgrounds show chaotic high-variance patterns due to disconnected surfaces.
  • Because DIS typically lacks depth maps, the authors generate pseudo-depth using monocular depth estimation to quickly capture semantic and depth-aware spatial differences between foreground objects and background.
  • The proposed Prior-guided Depth Fusion Network (PDFNet) fuses RGB with pseudo-depth features, adds a depth integrity-prior loss for depth-consistent segmentation, and uses a fine-grained enhancement module with adaptive patch selection to improve boundary sharpness.
  • Experiments report state-of-the-art performance (Fmax 0.915 on DIS-VD and 0.915 on DIS-TE) while using less than half the parameters of diffusion-based methods, and the code is publicly available.

Abstract

High-precision dichotomous image segmentation (DIS) is a task of extracting fine-grained objects from high-resolution images. Existing methods trade efficiency for accuracy: non-diffusion methods are fast but suffer from weak semantics and unstable spatial priors, causing false detections; diffusion-based methods offer high accuracy via strong generative priors but are computationally expensive. In depth maps, a complete object appears as a low variance region with a smooth interior and sharp boundaries, whereas the background exhibits a chaotic, high variance pattern due to disconnected surfaces at varying depths. We refer to this as the depth integrity-prior. Inspired by this, and noting that DIS currently lacks depth maps, we leverage pseudo-depth information from monocular depth estimation models to obtain essential semantic understanding, thereby rapidly revealing spatial differences across target objects and the background. To exploit this prior, we propose the Prior-guided Depth Fusion Network (PDFNet), which fuses RGB and pseudo-depth features for depth-aware structure perception. We further introduce a novel depth integrity-prior loss to enforce depth consistency in segmentation and a fine-grained enhancement module with adaptive patch selection to sharpen boundaries. Notably, PDFNet with DAM-v2 achieves SOTA (Fmax 0.915 on DIS-VD and 0.915 on DIS-TE) using less than half the params of diffusion-based methods. Our code is available at https://tennine2077.github.io/PDFNet.github.io/ .