Selective Attention-Based Network for Robust Infrared Small Target Detection

arXiv cs.CV / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper targets infrared small target detection (IRSTD), where dim sub-pixel targets in cluttered backgrounds cause low signal-to-clutter ratios and frequent false alarms.
  • It argues that prior deep-learning encoder–decoder models struggle due to an early-stage information bottleneck and skip connections that cannot adaptively separate true targets from pseudo-target regions.
  • The proposed SANet (built on U-Net) adds a Dual-path Semantic-aware Module (DSM) that combines local detail-preserving convolutions with pinwheel-shaped, direction-sensitive receptive fields plus CBAM for spatial-channel recalibration.
  • SANet further replaces static skip connections with a Selective Attention Fusion Module (SAFM) that uses spatially adaptive, learnable weighting for context-aware cross-scale feature fusion.
  • The overall approach aims to improve fine-grained target perception and reduce false detections by making feature extraction and fusion more dynamically discriminative.

Abstract

Infrared small target detection (IRSTD) plays a pivotal role in a broad spectrum of mission-critical applications, including maritime surveillance, military search and rescue, early warning systems, and precision-guided strikes, all of which demand the precise identification of dim, sub-pixel targets amid highly cluttered infrared backgrounds. Despite significant progress driven by deep learning methods, fundamental challenges persist: infrared small targets occupy extremely limited spatial extents (often only a few pixels), exhibit low signal-to-clutter ratios, and are easily confused with structurally complex backgrounds that frequently induce false alarms. Existing encoder-decoder architectures suffer from two key limitations - an information bottleneck in early convolutional stages that undermines fine-grained target perception, and static skip connections that lack the dynamic adaptability required to discriminate between genuine targets and pseudo-target regions. To address these challenges, we propose SANet, a Selective Attention-based Network built upon the classical U-Net framework and augmented with two novel components: (1) a \emph{Dual-path Semantic-aware Module} (DSM) that integrates standard convolutions for local spatial detail preservation with pinwheel-shaped convolutions for expanded, direction-sensitive receptive fields, followed by a Convolutional Block Attention Module (CBAM) for fine-grained spatial-channel feature recalibration; and (2) a \emph{Selective Attention Fusion Module} (SAFM) that replaces conventional static skip connections with a spatially adaptive, learnable weighting mechanism to perform context-aware, cross-scale feature fusion.