FACTOR: Counterfactual Training-Free Test-Time Adaptation for Open-Vocabulary Object Detection

arXiv cs.CV / 5/6/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Open-vocabulary object detection can break under distribution shifts because it may latch onto spurious, non-causal visual attributes (like brightness or texture) that correlate with classes rather than true semantics.
Prior test-time adaptation approaches are often either too expensive due to online optimization or too coarse via global calibration, failing to target attribute-specific failure modes.
FACTOR introduces a counterfactual, training-free test-time adaptation method that perturbs test images along non-causal attributes and compares region-level predictions between original and counterfactual views.
The method uses these comparisons to estimate attribute sensitivity and semantic relevance, then suppresses attribute-dependent predictions without updating model parameters.
Experiments on PASCAL-C, COCO-C, and FoggyCityscapes indicate FACTOR delivers consistent robustness gains over existing TTA baselines.

Abstract

Open-vocabulary object detection often fails under distribution shifts, as it can be misled by spurious correlations between non-causal visual attributes (e.g., brightness, texture) and object categories. Existing test-time adaptation (TTA) methods either depend on costly online optimization or perform global calibration, overlooking the attribute-specific nature of these failures. To address this, we propose FACTOR (counterFACtual training-free Test-time adaptation for Open-vocabulaRy object detection), a lightweight framework grounded in counterfactual reasoning. By perturbing test images along non-causal attributes and comparing region-level predictions between original and counterfactual views, FACTOR quantifies attribute sensitivity, semantic relevance, and prediction variation to selectively suppress attribute-dependent predictions-without parameter updates. Experiments on PASCAL-C, COCO-C, and FoggyCityscapes show that FACTOR consistently outperforms prior TTA methods, demonstrating that explicit counterfactual reasoning effectively improves robustness under distribution shifts.

Vibe coding and agentic engineering are getting closer than I'd like

Simon Willison's Blog

AI Harness Engineering: The Missing Layer Behind Reliable LLM Applications

Dev.to

Qwen3.6 27B NVFP4 + MTP on a single RTX 5090: 200k context working in vLLM

Reddit r/LocalLLaMA

AI boom pushes Samsung to $1T

TechCrunch

Why I Don’t Trust LLMs to Decide When the Weather Changed

Towards Data Science

FACTOR: Counterfactual Training-Free Test-Time Adaptation for Open-Vocabulary Object Detection

Key Points

Abstract

Related Articles

Vibe coding and agentic engineering are getting closer than I'd like

AI Harness Engineering: The Missing Layer Behind Reliable LLM Applications

Qwen3.6 27B NVFP4 + MTP on a single RTX 5090: 200k context working in vLLM

AI boom pushes Samsung to $1T

Why I Don’t Trust LLMs to Decide When the Weather Changed

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer