SFFNet: Synergistic Feature Fusion Network With Dual-Domain Edge Enhancement for UAV Image Object Detection

arXiv cs.CV / 4/6/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces SFFNet, a synergistic feature fusion network designed to improve object detection in UAV images by addressing noisy backgrounds and target scale imbalance.
It proposes an MDDC module that performs dual-domain edge enhancement across both frequency and spatial domains to better separate object edges from background noise at multiple scales.
It adds an SFPN to strengthen the detection “neck” with improved geometric and semantic representation, using linear deformable convolutions and a wide-area perception module for long-range contextual associations.
The approach includes multiple detector variants (N/S/M/B/L/X) to support different application requirements and resource-constrained settings, with lightweight models preserving a balance of accuracy and efficiency.
Experiments on VisDrone and UAVDT report strong results, with SFFNet-X reaching 36.8 AP and 20.6 AP, and the authors provide code via GitHub.

Abstract

Object detection in unmanned aerial vehicle (UAV) images remains a highly challenging task, primarily caused by the complexity of background noise and the imbalance of target scales. Traditional methods easily struggle to effectively separate objects from intricate backgrounds and fail to fully leverage the rich multi-scale information contained within images. To address these issues, we have developed a synergistic feature fusion network (SFFNet) with dual-domain edge enhancement specifically tailored for object detection in UAV images. Firstly, the multi-scale dynamic dual-domain coupling (MDDC) module is designed. This component introduces a dual-driven edge extraction architecture that operates in both the frequency and spatial domains, enabling effective decoupling of multi-scale object edges from background noise. Secondly, to further enhance the representation capability of the model's neck in terms of both geometric and semantic information, a synergistic feature pyramid network (SFPN) is proposed. SFPN leverages linear deformable convolutions to adaptively capture irregular object shapes and establishes long-range contextual associations around targets through the designed wide-area perception module (WPM). Moreover, to adapt to the various applications or resource-constrained scenarios, six detectors of different scales (N/S/M/B/L/X) are designed. Experiments on two challenging aerial datasets (VisDrone and UAVDT) demonstrate the outstanding performance of SFFNet-X, achieving 36.8 AP and 20.6 AP, respectively. The lightweight models (N/S) also maintain a balance between detection accuracy and parameter efficiency. The code will be available at https://github.com/CQNU-ZhangLab/SFFNet.