Denoise and Align: Towards Source-Free UDA for Robust Panoramic Semantic Segmentation

arXiv cs.CV / 3/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses robust panoramic semantic segmentation under source-free unsupervised domain adaptation (SFUDA), motivated by privacy/proprietary constraints that prevent access to labeled source data.
  • It identifies two core difficulties amplified by the source-free setting: domain shift that yields unreliable pseudo-labels and performance collapse on minority classes.
  • DAPASS introduces PCGD (Panoramic Confidence-Guided Denoising) to produce class-balanced, high-fidelity pseudo-labels via perturbation consistency and neighborhood-level confidence filtering.
  • It also proposes CRAM (Contextual Resolution Adversarial Module) to handle panoramic geometric distortions and scale variance by adversarially aligning fine details from high-resolution crops with global semantics from low-resolution context.
  • Experiments report state-of-the-art results on Cityscapes-to-DensePASS (55.04% mIoU) and Stanford2D3D (70.38% mIoU), showing consistent gains over prior methods.

Abstract

Panoramic semantic segmentation is pivotal for comprehensive 360{\deg} scene understanding in critical applications like autonomous driving and virtual reality. However, progress in this domain is constrained by two key challenges: the severe geometric distortions inherent in panoramic projections and the prohibitive cost of dense annotation. While Unsupervised Domain Adaptation (UDA) from label-rich pinhole-camera datasets offers a viable alternative, many real-world tasks impose a stricter source-free (SFUDA) constraint where source data is inaccessible for privacy or proprietary reasons. This constraint significantly amplifies the core problems of domain shift, leading to unreliable pseudo-labels and dramatic performance degradation, particularly for minority classes. To overcome these limitations, we propose the DAPASS framework. DAPASS introduces two synergistic modules to robustly transfer knowledge without source data. First, our Panoramic Confidence-Guided Denoising (PCGD) module generates high-fidelity, class-balanced pseudo-labels by enforcing perturbation consistency and incorporating neighborhood-level confidence to filter noise. Second, a Contextual Resolution Adversarial Module (CRAM) explicitly addresses scale variance and distortion by adversarially aligning fine-grained details from high-resolution crops with global semantics from low-resolution contexts. DAPASS achieves state-of-the-art performances on outdoor (Cityscapes-to-DensePASS) and indoor (Stanford2D3D) benchmarks, yielding 55.04% (+2.05%) and 70.38% (+1.54%) mIoU, respectively.