Bridge: Basis-Driven Causal Inference Marries VFMs for Domain Generalization

arXiv cs.CV / 4/30/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Bridge, a basis-driven causal inference framework for domain generalization in object detection to address performance loss from source–target distribution gaps.
  • Bridge uses front-door adjustment with learned low-rank bases to block confounder effects (such as illumination, co-occurrence, and style), reducing spurious correlations that harm transfer.
  • It also improves representations by filtering redundant and task-irrelevant components, leading to more robust detection features.
  • Bridge is designed to plug into both discriminative and generative vision foundation models (including DINOv2/3, SAM, and Stable Diffusion) without changing their core architectures.
  • Experiments on multiple domain generalization detection datasets, including newly augmented UAV-based Diverse Weather DroneVehicle, show Bridge outperforms prior state-of-the-art methods.

Abstract

Detectors often suffer from degraded performance, primarily due to the distributional gap between the source and target domains. This issue is especially evident in single-source domains with limited data, as models tend to rely on confounders (e.g., illumination, co-occurrence, and style) from the source domain, leading to spurious correlations that hinder generalization. To this end, this paper proposes a novel Basis-driven framework for domain generalization, namely \textbf{\textit{Bridge}}, that incorporates causal inference into object detection. By learning the low-rank bases for front-door adjustment, \textbf{\textit{Bridge}} blocks confounders' effects to mitigate spurious correlations, while simultaneously refining representations by filtering redundant and task-irrelevant components. \textbf{\textit{Bridge}} can be seamlessly integrated with both discriminative (e.g., DINOv2/3, SAM) and generative (e.g., Stable Diffusion) Vision Foundation Models (VFMs). Extensive experiments across multiple domain generalization object detection datasets, i.e., Cross-Camera, Adverse Weather, Real-to-Artistic, Diverse Weather Datasets, and Diverse Weather DroneVehicle (our newly augmented real-world UAV-based benchmark), underscore the superiority of our proposed method over previous state-of-the-art approaches. The project page is available at: https://mingbohong.github.io/Bridge/.