Adaptive Slicing-Assisted Hyper Inference for Enhanced Small Object Detection in High-Resolution Imagery

arXiv cs.CV / 4/22/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper introduces Adaptive Slicing-Assisted Hyper Inference (ASAHI) to improve small-object detection in high-resolution aerial/satellite imagery, where dense scenes and tiny targets make existing detectors struggle.
  • Unlike fixed patch slicing, ASAHI adaptively chooses the number of overlapping slices (6 or 12) based on image resolution using a learned threshold, aiming to cut redundant computation.
  • It includes slicing-assisted fine-tuning (SAF) by training on both full-resolution images and sliced patches to preserve detection quality while benefiting from larger effective receptive fields.
  • For crowded scenes, ASAHI uses Cluster-DIoU-NMS (CDN) to merge detections efficiently and suppress duplicates using center-distance-aware DIoU logic.
  • Experiments on VisDrone2019 and xView report state-of-the-art results (56.8% on VisDrone2019-DET-val, 22.7% on xView-test) and a 20–25% inference-time reduction versus the SAHI baseline.

Abstract

Deep learning-based object detectors have achieved remarkable success across numerous computer vision applications, yet they continue to struggle with small object detection in high-resolution aerial and satellite imagery, where dense object distributions, variable shooting angles, diminutive target sizes, and substantial inter-class variability pose formidable challenges. Existing slicing strategies that partition high-resolution images into manageable patches have demonstrated promising results for enlarging the effective receptive field of small targets; however, their reliance on fixed slice dimensions introduces significant redundant computation, inflating inference cost and undermining detection speed. In this paper, we propose \textbf{Adaptive Slicing-Assisted Hyper Inference (ASAHI)}, a novel slicing framework that shifts the paradigm from prescribing a fixed slice size to adaptively determining the optimal number of slices according to image resolution, thereby substantially mitigating redundant computation while preserving beneficial overlap between adjacent patches. ASAHI integrates three synergistic components: (1)an adaptive resolution-aware slicing algorithm that dynamically generates 6 or 12 overlapping patches based on a learned threshold, (2)a slicing-assisted fine-tuning (SAF) strategy that constructs augmented training data comprising both full-resolution and sliced image patches, and (3)a Cluster-DIoU-NMS (CDN) post-processing module that combines the geometric merging efficiency of Cluster-NMS with the center-distance-aware suppression of DIoU-NMS to achieve robust duplicate elimination in crowded scenes. Extensive experiments on VisDrone2019 and xView, demonstrate that ASAHI achieves state-of-the-art performance with 56.8% on VisDrone2019-DET-val and 22.7% on xView-test, while reducing inference time by 20-25% compared to the baseline SAHI method.