Generalized Small Object Detection:A Point-Prompted Paradigm and Benchmark

arXiv cs.CV / 4/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces TinySet-9M, a large-scale, multi-domain dataset designed to address the long-standing lack of high-quality data for small object detection.
It establishes a benchmark to assess label-efficient detection methods for small objects and finds that weak visual cues notably worsen performance for label-efficient approaches.
To improve semantic representation without relying on training-time feature enhancement, the authors propose Point-Prompt Small Object Detection (P2SOD), which uses sparse point prompts at inference time to bridge category-level localization.
Building on P2SOD and TinySet-9M, the paper presents DEAL, a scalable and transferable point-prompted framework that learns robust prompt-conditioned representations from large-scale data.
DEAL reportedly achieves a 31.4% relative improvement over fully supervised baselines under strict localization metrics (e.g., AP75) and generalizes to unseen categories and datasets with only a single click at inference.

Abstract

Small object detection (SOD) remains challenging due to extremely limited pixels and ambiguous object boundaries. These characteristics lead to challenging annotation, limited availability of large-scale high-quality datasets, and inherently weak semantic representations for small objects. In this work, we first address the data limitation by introducing TinySet-9M, the first large-scale, multi-domain dataset for small object detection. Beyond filling the gap in large-scale datasets, we establish a benchmark to evaluate the effectiveness of existing label-efficient detection methods for small objects. Our evaluation reveals that weak visual cues further exacerbate the performance degradation of label-efficient methods in small object detection, highlighting a critical challenge in label-efficient SOD. Secondly, to tackle the limitation of insufficient semantic representation, we move beyond training-time feature enhancement and propose a new paradigm termed Point-Prompt Small Object Detection (P2SOD). This paradigm introduces sparse point prompts at inference time as an efficient information bridge for category-level localization, enabling semantic augmentation. Building upon the P2SOD paradigm and the large-scale TinySet-9M dataset, we further develop DEAL (DEtect Any smalL object), a scalable and transferable point-prompted detection framework that learns robust, prompt-conditioned representations from large-scale data. With only a single click at inference time, DEAL achieves a 31.4% relative improvement over fully supervised baselines under strict localization metrics (e.g., AP75) on TinySet-9M, while generalizing effectively to unseen categories and unseen datasets. Our project is available at https://zhuhaoraneis.github.io/TinySet-9M/.