SGANet: Semantic and Geometric Alignment for Multimodal Multi-view Anomaly Detection

arXiv cs.CV / 4/8/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces SGANet, a unified framework for multimodal multi-view anomaly detection that tackles feature inconsistency caused by viewpoint changes and modality discrepancies.
  • SGANet combines three components—SCFRM for selective cross-view feature refinement, SSPA for semantic and structural patch alignment across modalities, and MVGA for geometric alignment across viewpoints.
  • By jointly modeling cross-view feature interaction plus semantic/structural coherence and global geometric correspondence, SGANet learns more physically consistent representations.
  • Experiments on the SiM3D and Eyecandies datasets show state-of-the-art results for both anomaly detection and localization, with reported relevance to industrial defect inspection.

Abstract

Multi-view anomaly detection aims to identify surface defects on complex objects using observations captured from multiple viewpoints. However, existing unsupervised methods often suffer from feature inconsistency arising from viewpoint variations and modality discrepancies. To address these challenges, we propose a Semantic and Geometric Alignment Network (SGANet), a unified framework for multimodal multi-view anomaly detection that effectively combines semantic and geometric alignment to learn physically coherent feature representations across viewpoints and modalities. SGANet consists of three key components. The Selective Cross-view Feature Refinement Module (SCFRM) selectively aggregates informative patch features from adjacent views to enhance cross-view feature interaction. The Semantic-Structural Patch Alignment (SSPA) enforces semantic alignment across modalities while maintaining structural consistency under viewpoint transformations. The Multi-View Geometric Alignment (MVGA) further aligns geometrically corresponding patches across viewpoints. By jointly modeling feature interaction, semantic and structural consistency, and global geometric correspondence, SGANet effectively enhances anomaly detection performance in multimodal multi-view settings. Extensive experiments on the SiM3D and Eyecandies datasets demonstrate that SGANet achieves state-of-the-art performance in both anomaly detection and localization, validating its effectiveness in realistic industrial scenarios.