Divide-and-Conquer Approach to Holistic Cognition in High-Similarity Contexts with Limited Data

arXiv cs.CV / 4/22/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses Ultra-FGVC recognition under limited training data by targeting holistic yet discriminative visual cues (e.g., leaf contours) that are often overlooked in prior work.
It proposes DHCNet, a divide-and-conquer holistic cognition network that simplifies modeling complex holistic morphology by decomposing cues into spatially associated subtle discrepancies and building holistic understanding progressively.
DHCNet uses a self-shuffling operation to analyze discrepancies from small local patches to larger regions, while leveraging unaffected local areas to help infer spatial/topological associations among shuffled patches.
The method introduces online refinement of the discovered holistic cues during training and uses these cues as supervision to improve the recognition model’s sensitivity to holistic features across whole objects.
Experiments on five benchmark Ultra-FGVC datasets show DHCNet delivers strong performance improvements, indicating the approach is effective for data-limited, high-similarity classification tasks.

Abstract

Ultra-fine-grained visual categorization (Ultra-FGVC) aims to classify highly similar subcategories within fine-grained objects using limited training samples. However, holistic yet discriminative cues, such as leaf contours in extremely similar cultivars, remain under-explored in current studies, thereby limiting recognition performance. Though crucial, modeling holistic cues with complex morphological structures typically requires massive training samples, posing significant challenges in data-limited scenarios. To address this challenge, we propose a novel Divide-and-Conquer Holistic Cognition Network (DHCNet) that implements a divide-and-conquer strategy by decomposing holistic cues into spatially-associated subtle discrepancies and progressively establishing the holistic cognition process, significantly simplifying holistic cognition while reducing dependency on training data. Technically, DHCNet begins by progressively analyzing subtle discrepancies, transitioning from smaller local patches to larger ones using a self-shuffling operation on local regions. Simultaneously, it leverages the unaffected local regions to potentially guide the perception of the original topological structure among the shuffled patches, thereby aiding in the establishment of spatial associations for these discrepancies. Additionally, DHCNet incorporates the online refinement of these holistic cues discovered from local regions into the training process to iteratively improve their quality. As a result, DHCNet uses these holistic cues as supervisory signals to fine-tune the parameters of the recognition model, thus improving its sensitivity to holistic cues across the entire objects. Extensive evaluations demonstrate that DHCNet achieves remarkable performance on five widely-used Ultra-FGVC datasets.