FACTOR: Counterfactual Training-Free Test-Time Adaptation for Open-Vocabulary Object Detection

arXiv cs.CV / 5/6/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • Open-vocabulary object detection can break under distribution shifts because it may latch onto spurious, non-causal visual attributes (like brightness or texture) that correlate with classes rather than true semantics.
  • Prior test-time adaptation approaches are often either too expensive due to online optimization or too coarse via global calibration, failing to target attribute-specific failure modes.
  • FACTOR introduces a counterfactual, training-free test-time adaptation method that perturbs test images along non-causal attributes and compares region-level predictions between original and counterfactual views.
  • The method uses these comparisons to estimate attribute sensitivity and semantic relevance, then suppresses attribute-dependent predictions without updating model parameters.
  • Experiments on PASCAL-C, COCO-C, and FoggyCityscapes indicate FACTOR delivers consistent robustness gains over existing TTA baselines.

Abstract

Open-vocabulary object detection often fails under distribution shifts, as it can be misled by spurious correlations between non-causal visual attributes (e.g., brightness, texture) and object categories. Existing test-time adaptation (TTA) methods either depend on costly online optimization or perform global calibration, overlooking the attribute-specific nature of these failures. To address this, we propose FACTOR (counterFACtual training-free Test-time adaptation for Open-vocabulaRy object detection), a lightweight framework grounded in counterfactual reasoning. By perturbing test images along non-causal attributes and comparing region-level predictions between original and counterfactual views, FACTOR quantifies attribute sensitivity, semantic relevance, and prediction variation to selectively suppress attribute-dependent predictions-without parameter updates. Experiments on PASCAL-C, COCO-C, and FoggyCityscapes show that FACTOR consistently outperforms prior TTA methods, demonstrating that explicit counterfactual reasoning effectively improves robustness under distribution shifts.