PiCo: Active Manifold Canonicalization for Robust Robotic Visual Anomaly Detection

arXiv cs.CV / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • PiCo (Pose-in-Condition Canonicalization) introduces an Active Canonicalization paradigm to improve robotic visual anomaly detection under diverse 6-DoF poses, illumination changes, and physical disturbances.
  • The framework uses a two-stage cascaded approach: Active Physical Canonicalization reorients objects to reduce geometric uncertainty, followed by Neural Latent Canonicalization that removes nuisance factors across photometric, feature (latent), and semantic/context levels via a denoising hierarchy.
  • Experiments on the large-scale M2AD benchmark show PiCo reaches 93.7% O-AUROC (a 3.7% gain over prior methods) in static settings and 98.5% accuracy in active closed-loop scenarios.
  • The results suggest that projecting observations onto a condition-invariant canonical manifold via active manifold canonicalization is important for robust embodied perception.

Abstract

Industrial deployment of robotic visual anomaly detection (VAD) is fundamentally constrained by passive perception under diverse 6-DoF pose configurations and unstable operating conditions such as illumination changes and shadows, where intrinsic semantic anomalies and physical disturbances coexist and interact. To overcome these limitations, a paradigm shift from passive feature learning to Active Canonicalization is proposed. PiCo (Pose-in-Condition Canonicalization) is introduced as a unified framework that actively projects observations onto a condition-invariant canonical manifold. PiCo operates through a cascaded mechanism. The first stage, Active Physical Canonicalization, enables a robotic agent to reorient objects in order to reduce geometric uncertainty at its source. The second stage, Neural Latent Canonicalization, adopts a three-stage denoising hierarchy consisting of photometric processing at the input level, latent refinement at the feature level, and contextual reasoning at the semantic level, progressively eliminating nuisance factors across representational scales. Extensive evaluations on the large-scale M2AD benchmark demonstrate the superiority of this paradigm. PiCo achieves a state-of-the-art 93.7% O-AUROC, representing a 3.7% improvement over prior methods in static settings, and attains 98.5% accuracy in active closed-loop scenarios. These results demonstrate that active manifold canonicalization is critical for robust embodied perception.