AI Navigate

Remedying Target-Domain Astigmatism for Cross-Domain Few-Shot Object Detection

arXiv cs.CV / 3/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper identifies a new problem in cross-domain few-shot object detection called target-domain astigmatism, where models show dispersed and unfocused attention in the target domain leading to imprecise localization and redundant predictions.
  • It introduces a bio-inspired center-periphery attention refinement framework with three modules: Positive Pattern Refinement using class-specific prototypes to focus attention on semantic objects, Negative Context Modulation to improve boundary discrimination by modeling background context, and Textual Semantic Alignment to strengthen center-periphery distinctions via cross-modal cues.
  • The approach aims to transform astigmatic attention into focused patterns by leveraging a fovea-style visual system analogy to enhance fine-tuning during adaptation.
  • Experiments on six challenging CD-FSOD benchmarks demonstrate consistent improvements and establish new state-of-the-art results for cross-domain few-shot object detection.

Abstract

Cross-domain few-shot object detection (CD-FSOD) aims to adapt pretrained detectors from a source domain to target domains with limited annotations, suffering from severe domain shifts and data scarcity problems. In this work, we find a previously overlooked phenomenon: models exhibit dispersed and unfocused attention in target domains, leading to imprecise localization and redundant predictions, just like a human cannot focus on visual objects. Therefore, we call it the target-domain Astigmatism problem. Analysis on attention distances across transformer layers reveals that regular fine-tuning inherently shows a trend to remedy this problem, but results are still far from satisfactory, which we aim to enhance in this paper. Biologically inspired by the human fovea-style visual system, we enhance the fine-tuning's inherent trend through a center-periphery attention refinement framework, which contains (1) a Positive Pattern Refinement module to reshape attention toward semantic objects using class-specific prototypes, simulating the visual center region; (2) a Negative Context Modulation module to enhance boundary discrimination by modeling background context, simulating the visual periphery region; and (3) a Textual Semantic Alignment module to strengthen center-periphery distinction through cross-modal cues. Our bio-inspired approach transforms astigmatic attention into focused patterns, substantially improving adaptation to target domains. Experiments on six challenging CD-FSOD benchmarks consistently demonstrate improved detection accuracy and establish new state-of-the-art results.