FSDETR: Frequency-Spatial Feature Enhancement for Small Object Detection

arXiv cs.CV / 4/17/2026

📰 NewsModels & Research

共有:

Key Points

Small object detection is difficult because downsampling degrades features, dense scenes cause mutual occlusion, and complex backgrounds interfere with recognition.
The paper introduces FSDETR, a frequency–spatial feature enhancement framework built on the RT-DETR baseline, aiming to better preserve complementary structural information.
FSDETR uses a Spatial Hierarchical Attention Block (SHAB) to capture both local details and global dependencies for stronger semantic representation.
To address occlusion and dense-scene challenges, it adds a Deformable Attention-based Intra-scale Feature Interaction (DA-AIFI) that performs dynamic sampling of informative regions.
It also proposes a Frequency-Spatial Feature Pyramid Network (FSFPN) with a Cross-domain Frequency-Spatial Block (CFSB) that combines frequency filtering with spatial edge extraction, achieving strong small-object results with only 14.7M parameters.

Abstract

Small object detection remains a significant challenge due to feature degradation from downsampling, mutual occlusion in dense clusters, and complex background interference. To address these issues, this paper proposes FSDETR, a frequency-spatial feature enhancement framework built upon the RT-DETR baseline. By establishing a collaborative modeling mechanism, the method effectively leverages complementary structural information. Specifically, a Spatial Hierarchical Attention Block (SHAB) captures both local details and global dependencies to strengthen semantic representation. Furthermore, to mitigate occlusion in dense scenes, the Deformable Attention-based Intra-scale Feature Interaction (DA-AIFI) focuses on informative regions via dynamic sampling. Finally, the Frequency-Spatial Feature Pyramid Network (FSFPN) integrates frequency filtering with spatial edge extraction via the Cross-domain Frequency-Spatial Block (CFSB) to preserve fine-grained details. Experimental results show that with only 14.7M parameters, FSDETR achieves 13.9% APS on VisDrone 2019 and 48.95% AP50 tiny on TinyPerson, showing strong performance on small-object benchmarks. The code and models are available at https://github.com/YT3DVision/FSDETR.

langchain-anthropic==1.4.1

LangChain Releases

Talk to Your Favorite Game Characters! Mantella Brings AI to Skyrim and Fallout 4 NPCs

Dev.to

OpenAI Codex Update Adds macOS Agent, Browser, Memory; 3M Weekly Users

Dev.to

1.14.2

CrewAI Releases

Should my enterprise AI agent do that? NanoClaw and Vercel launch easier agentic policy setting and approval dialogs across 15 messaging apps

VentureBeat

FSDETR: Frequency-Spatial Feature Enhancement for Small Object Detection

Key Points

Abstract

Related Articles

langchain-anthropic==1.4.1

Talk to Your Favorite Game Characters! Mantella Brings AI to Skyrim and Fallout 4 NPCs

OpenAI Codex Update Adds macOS Agent, Browser, Memory; 3M Weekly Users

1.14.2

Should my enterprise AI agent do that? NanoClaw and Vercel launch easier agentic policy setting and approval dialogs across 15 messaging apps

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer