AI Navigate

Structured prototype regularization for synthetic-to-real driving scene parsing

arXiv cs.CV / 3/18/2026

📰 NewsModels & Research

Key Points

  • The paper proposes an unsupervised domain adaptation framework for driving scene parsing that reduces the synthetic-to-real gap by explicitly regularizing semantic feature structures with class-specific prototypes to promote inter-class separation and intra-class compactness.
  • It combines an entropy-based noise filtering strategy to improve pseudo-label reliability with a pixel-level attention mechanism to refine cross-domain feature alignment.
  • Extensive experiments on representative benchmarks show the method consistently outperforms recent state-of-the-art approaches, underscoring the value of preserving semantic structure for robust adaptation.
  • By leveraging synthetic data with automatic labels, the approach aims to reduce annotation costs while improving real-world driving scene parsing performance.

Abstract

Driving scene parsing is critical for autonomous vehicles to operate reliably in complex real-world traffic environments. To reduce the reliance on costly pixel-level annotations, synthetic datasets with automatically generated labels have become a popular alternative. However, models trained on synthetic data often perform poorly when applied to real-world scenes due to the synthetic-to-real domain gap. Despite the success of unsupervised domain adaptation in narrowing this gap, most existing methods mainly focus on global feature alignment while overlooking the semantic structure of the feature space. As a result, semantic relations among classes are insufficiently modeled, limiting the model's ability to generalize. To address these challenges, this study introduces a novel unsupervised domain adaptation framework that explicitly regularizes semantic feature structures to significantly enhance driving scene parsing performance in real-world scenarios. Specifically, the proposed method enforces inter-class separation and intra-class compactness by leveraging class-specific prototypes, thereby enhancing the discriminability and structural coherence of feature clusters. An entropy-based noise filtering strategy improves the reliability of pseudo labels, while a pixel-level attention mechanism further refines feature alignment. Extensive experiments on representative benchmarks demonstrate that the proposed method consistently outperforms recent state-of-the-art methods. These results underscore the importance of preserving semantic structure for robust synthetic-to-real adaptation in driving scene parsing tasks.