Revisiting the Scale Loss Function and Gaussian-Shape Convolution for Infrared Small Target Detection

arXiv cs.CV / 4/14/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses infrared small target detection challenges related to training instability caused by non-monotonic scale loss functions by proposing a diff-based scale loss with strictly monotonic gradients for stable convergence.
It analyzes multiple scale loss variants to explain how their geometric/gradient properties influence detection behavior and performance.
To improve spatial attention, the authors introduce Gaussian-shaped convolution with a learnable scale parameter that better matches the center-concentrated intensity profile of infrared small targets.
They further enhance the kernel alignment using a rotated pinwheel mask that adaptively orients convolution with a straight-through estimator.
Experiments on IRSTD-1k, NUDT-SIRST, and SIRST-UAVB show consistent gains in mIoU, P_d, and F_a over state-of-the-art approaches, and the authors release anonymous code and pretrained models.

Abstract

Infrared small target detection still faces two persistent challenges: training instability from non-monotonic scale loss functions, and inadequate spatial attention due to generic convolution kernels that ignore the physical imaging characteristics of small targets. In this paper, we revisit both aspects. For the loss side, we propose a \emph{diff-based scale loss} that weights predictions according to the signed area difference between the predicted mask and the ground truth, yielding strictly monotonic gradients and stable convergence. We further analyze a family of four scale loss variants to understand how their geometric properties affect detection behavior. For the spatial side, we introduce \emph{Gaussian-shaped convolution} with a learnable scale parameter to match the center-concentrated intensity profile of infrared small targets, and augment it with a \emph{rotated pinwheel mask} that adaptively aligns the kernel with target orientation via a straight-through estimator. Extensive experiments on IRSTD-1k, NUDT-SIRST, and SIRST-UAVB demonstrate consistent improvements in

mIoU

P_d

, and

F_a

over state-of-the-art methods. We release our anonymous code and pretrained models.

Emerging Properties in Unified Multimodal Pretraining

Dev.to

Build a Profit-Generating AI Agent with LangChain: A Step-by-Step Tutorial

Dev.to

Open source AI is winning — but here's why I still pay $2/month for Claude API

Dev.to

AI Agents Need Real Email Infrastructure

Dev.to

Beyond the Prompt: Why AI Agents Are Hitting the Deployment Wall

Dev.to

Revisiting the Scale Loss Function and Gaussian-Shape Convolution for Infrared Small Target Detection

Key Points

Abstract

Related Articles

Emerging Properties in Unified Multimodal Pretraining

Build a Profit-Generating AI Agent with LangChain: A Step-by-Step Tutorial

Open source AI is winning — but here's why I still pay $2/month for Claude API

AI Agents Need Real Email Infrastructure

Beyond the Prompt: Why AI Agents Are Hitting the Deployment Wall

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer