AI Navigate

DA-Mamba: Learning Domain-Aware State Space Model for Global-Local Alignment in Domain Adaptive Object Detection

arXiv cs.CV / 3/20/2026

📰 NewsModels & Research

Key Points

  • DA-Mamba proposes a hybrid CNN-State Space Model architecture to enhance domain adaptive object detection by capturing both local details and long-range dependencies with linear-time complexity.
  • It introduces two modules, Image-Aware SSM (IA-SSM) in the backbone for image-level global/local alignment and Object-Aware SSM (OA-SSM) in the detection head for modeling spatial and semantic dependencies among objects.
  • The method combines CNN efficiency with SSMs to achieve linear-time long-range modeling and reduce the quadratic costs of transformer-based approaches.
  • Experiments on DAOD benchmarks show improved cross-domain performance and efficiency, demonstrating effectiveness of the approach.

Abstract

Domain Adaptive Object Detection (DAOD) aims to transfer detectors from a labeled source domain to an unlabeled target domain. Existing DAOD methods employ multi-granularity feature alignment to learn domain-invariant representations. However, the local connectivity of their CNN-based backbone and detection head restricts alignment to local regions, failing to extract global domain-invariant features. Although transformer-based DAOD methods capture global dependencies via attention mechanisms, their quadratic computational cost hinders practical deployment. To solve this, we propose DA-Mamba, a hybrid CNN-State Space Models (SSMs) architecture that combines the efficiency of CNNs with the linear-time long-range modeling capability of State Space Models (SSMs) to capture both global and local domain-invariant features. Specifically, we introduce two novel modules: Image-Aware SSM (IA-SSM) and Object-Aware SSM (OA-SSM). IA-SSM is integrated into the backbone to enhance global domain awareness, enabling image-level global and local alignment. OA-SSM is inserted into the detection head to model spatial and semantic dependencies among objects, enhancing instance-level alignment. Comprehensive experiments demonstrate that the proposed method can efficiently improve the cross-domain performance of the object detector.