EReCu: Pseudo-label Evolution Fusion and Refinement with Multi-Cue Learning for Unsupervised Camouflage Detection

arXiv cs.CV / 3/13/2026

📰 NewsModels & Research

共有:

Key Points

The paper introduces EReCu, a unified unsupervised camouflage object detection framework that improves pseudo-label reliability and feature fidelity.
It proposes the Multi-Cue Native Perception module that combines low-level texture cues with mid-level semantics to better align masks with native object information.
It introduces Pseudo-Label Evolution Fusion, enabling teacher-student refinement and using depthwise separable convolutions for efficient semantic denoising.
Spectral Tensor Attention Fusion is used to balance semantic and structural information via compact spectral aggregation across multiple layers of attention maps.
Local Pseudo-Label Refinement leverages attention diversity to recover fine textures and improve boundary fidelity, achieving state-of-the-art results on UCOD benchmarks with strong generalization.

Abstract

Unsupervised Camouflaged Object Detection (UCOD) remains a challenging task due to the high intrinsic similarity between target objects and their surroundings, as well as the reliance on noisy pseudo-labels that hinder fine-grained texture learning. While existing refinement strategies aim to alleviate label noise, they often overlook intrinsic perceptual cues, leading to boundary overflow and structural ambiguity. In contrast, learning without pseudo-label guidance yields coarse features with significant detail loss. To address these issues, we propose a unified UCOD framework that enhances both the reliability of pseudo-labels and the fidelity of features. Our approach introduces the Multi-Cue Native Perception module, which extracts intrinsic visual priors by integrating low-level texture cues with mid-level semantics, enabling precise alignment between masks and native object information. Additionally, Pseudo-Label Evolution Fusion intelligently refines labels through teacher-student interaction and utilizes depthwise separable convolution for efficient semantic denoising. It also incorporates Spectral Tensor Attention Fusion to effectively balance semantic and structural information through compact spectral aggregation across multi-layer attention maps. Finally, Local Pseudo-Label Refinement plays a pivotal role in local detail optimization by leveraging attention diversity to restore fine textures and enhance boundary fidelity. Extensive experiments on multiple UCOD datasets demonstrate that our method achieves state-of-the-art performance, characterized by superior detail perception, robust boundary alignment, and strong generalization under complex camouflage scenarios.