Hard to See, Hard to Label: Generative and Symbolic Acquisition for Subtle Visual Phenomena

arXiv cs.CV / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper targets a key active-learning failure mode for subtle anomalies (e.g., hairline cracks and low-contrast inclusions) where common heuristics overselect dominant visual patterns and miss rare, structurally atypical regions.
  • It introduces GSAL, an active-learning framework for object detection that combines a diffusion-based “visual difficulty” signal with a hierarchical semantic coverage prior.
  • The diffusion part prioritizes proposals using reconstruction discrepancy and denoising variability, aiming to surface visually ambiguous or atypical examples that uncertainty-only methods may miss.
  • To avoid repeatedly selecting difficult samples within the same dominant semantic mode, GSAL uses a three-level concept graph to encourage acquisition across underrepresented semantic regions with interpretable rationales.
  • Experiments across thin-film defect inspection data (proprietary) plus Pascal VOC and MS COCO show improved label efficiency and better retrieval of rare classes versus uncertainty-, diversity-, and hybrid baselines.

Abstract

Subtle visual anomalies such as hairline cracks, sub-millimeter voids, and low-contrast inclusions are structurally atypical yet visually ambiguous, making them both difficult to annotate and easy to overlook during active learning. Standard acquisition heuristics based on discriminative uncertainty or feature diversity often overselect dominant patterns while underexploring sparse yet important regions of the data space. This failure mode is especially severe in industrial defect inspection, where anomalies may be both low-prevalence and difficult to distinguish from surrounding structure. To resolve this, we propose GSAL, an active learning framework for object detection that combines a diffusion-based difficulty signal with a hierarchical semantic coverage prior. The diffusion component scores images and proposals using reconstruction discrepancy and denoising variability, prioritizing visually atypical or ambiguous examples. However, diffusion alone does not prevent acquisition from repeatedly favoring hard samples within dominant semantic modes. The semantic component therefore organizes candidate samples in a three-level concept graph and promotes coverage of underrepresented semantic regions while providing interpretable acquisition rationales. By balancing visual difficulty with semantic coverage, GSAL improves retrieval of subtle and rare targets that are often missed by uncertainty-only selection. Experiments on a proprietary thin-film defect, Pascal VOC and MS COCO dataset show consistent gains in label efficiency and rare-class retrieval over uncertainty-, diversity-, and hybrid-based baselines