AI Navigate

PKINet-v2: Towards Powerful and Efficient Poly-Kernel Remote Sensing Object Detection

arXiv cs.CV / 3/18/2026

📰 NewsModels & Research

Key Points

  • PKINet-v2 combines anisotropic axial-strip convolutions with isotropic square kernels to create a multi-scope receptive field that preserves fine details while capturing long-range context.
  • It introduces a Heterogeneous Kernel Re-parameterization (HKR) strategy that fuses all branches into a single depth-wise convolution for efficient inference without accuracy loss.
  • The model achieves state-of-the-art accuracy on multiple remote sensing benchmarks (DOTA-v1.0, DOTA-v1.5, HRSC2016, and DIOR-R) and delivers a 3.9x FPS acceleration over PKINet-v1.
  • By jointly handling slender and broad targets, PKINet-v2 addresses the challenges of diverse aspect ratios and sizes in remote sensing object detection.
  • The approach offers both improved effectiveness and deployment efficiency, enabling practical use in RS imaging pipelines.

Abstract

Object detection in remote sensing images (RSIs) is challenged by the coexistence of geometric and spatial complexity: targets may appear with diverse aspect ratios, while spanning a wide range of object sizes under varied contexts. Existing RSI backbones address the two challenges separately, either by adopting anisotropic strip kernels to model slender targets or by using isotropic large kernels to capture broader context. However, such isolated treatments lead to complementary drawbacks: the strip-only design can disrupt spatial coherence for regular-shaped objects and weaken tiny details, whereas isotropic large kernels often introduce severe background noise and geometric mismatch for slender structures. In this paper, we extend PKINet, and present a powerful and efficient backbone that jointly handles both challenges within a unified paradigm named Poly Kernel Inception Network v2 (PKINet-v2). PKINet-v2 synergizes anisotropic axial-strip convolutions with isotropic square kernels and builds a multi-scope receptive field, preserving fine-grained local textures while progressively aggregating long-range context across scales. To enable efficient deployment, we further introduce a Heterogeneous Kernel Re-parameterization (HKR) Strategy that fuses all heterogeneous branches into a single depth-wise convolution for inference, eliminating fragmented kernel launches without accuracy loss. Extensive experiments on four widely-used benchmarks, including DOTA-v1.0, DOTA-v1.5, HRSC2016, and DIOR-R, demonstrate that PKINet-v2 achieves state-of-the-art accuracy while delivering a \textbf{3.9}\times FPS acceleration compared to PKINet-v1, surpassing previous remote sensing backbones in both effectiveness and efficiency.