Rethinking IRSTD: Single-Point Supervision Guided Encoder-only Framework is Enough for Infrared Small Target Detection

arXiv cs.CV / 4/8/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that infrared small target detection (IRSTD) should emphasize target localization rather than pixel-level encoder–decoder segmentation, because targets are only a few pixels and often have blurred boundaries from clutter.
  • It reformulates IRSTD as a centroid regression problem and introduces SPIRE, a Single-Point Supervision guided Infrared Probabilistic Response Encoding method that is designed to work with an encoder-only, end-to-end pipeline.
  • SPIRE uses Point-Response Prior Supervision (PRPS) to convert single-point labels into probabilistic response maps that better match infrared point-target characteristics.
  • A High-Resolution Probabilistic Encoder (HRPE) is proposed to directly regress the output without decoder reconstruction, aiming to reduce optimization instability under sparse target distributions.
  • Experiments on benchmarks such as SIRST-UAVB and SIRST4 show competitive target-level detection with low false alarm rates and significantly lower computational cost, and the code is released publicly.

Abstract

Infrared small target detection (IRSTD) aims to separate small targets from clutter backgrounds. Extensive research is dedicated to the pixel-level supervision-guided "encoder-decoder" segmentation paradigm. Although having achieved promising performance, they neglect the fact that small targets only occupy a few pixels and are usually accompanied with blurred boundary caused by clutter backgrounds. Based on this observation, we argue that the first principle of IRSTD should be target localization instead of separating all target region accompanied with indistinguishable background noise. In this paper, we reformulate IRSTD as a centroid regression task and propose a novel Single-Point Supervision guided Infrared Probabilistic Response Encoding method (namely, SPIRE), which is indeed challenging due to the mismatch between reduced supervision network and equivalent output. Specifically, we first design a Point-Response Prior Supervision (PRPS), which transforms single-point annotations into probabilistic response map consistent with infrared point-target response characteristics, with a High-Resolution Probabilistic Encoder (HRPE) that enables encoder-only, end-to-end regression without decoder reconstruction. By preserving high-resolution features and increasing effective supervision density, SPIRE alleviates optimization instability under sparse target distributions. Finally, extensive experiments on various IRSTD benchmarks, including SIRST-UAVB and SIRST4 demonstrate that SPIRE achieves competitive target-level detection performance with consistently low false alarm rate (Fa) and significantly reduced computational cost. Code is publicly available at: https://github.com/NIRIXIANG/SPIRE-IRSTD.

Rethinking IRSTD: Single-Point Supervision Guided Encoder-only Framework is Enough for Infrared Small Target Detection | AI Navigate