Learning Where to Embed: Noise-Aware Positional Embedding for Query Retrieval in Small-Object Detection

arXiv cs.CV / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper addresses inefficiency and background-induced “query noise” in transformer-based small-object detectors by refining where positional information is embedded.
It proposes HELP (Heatmap-guided Embedding Learning Paradigm), which selectively preserves positional encodings in foreground-salient regions while suppressing background clutter.
The core method, Heatmap-guided Positional Embedding (HPE), fuses positional and semantic information in both encoder and decoder, using a gradient-based mask filter to improve query retrieval.
To handle sparse features in small, complex targets, the approach integrates Linear-Snake Convolution to enrich retrieval-relevant representations.
Experiments show substantial model compression—cutting decoder layers from 8 to 3 and reducing parameters by 59.4%—while maintaining accuracy improvements with no extra training-time gradient cost at inference.

Abstract

Transformer-based detectors have advanced small-object detection, but they often remain inefficient and vulnerable to background-induced query noise, which motivates deep decoders to refine low-quality queries. We present HELP (Heatmap-guided Embedding Learning Paradigm), a noise-aware positional-semantic fusion framework that studies where to embed positional information by selectively preserving positional encodings in foreground-salient regions while suppressing background clutter. Within HELP, we introduce Heatmap-guided Positional Embedding (HPE) as the core embedding mechanism and visualize it with a heatbar for interpretable diagnosis and fine-tuning. HPE is integrated into both the encoder and decoder: it guides noise-suppressed feature encoding by injecting heatmap-aware positional encoding, and it enables high-quality query retrieval by filtering background-dominant embeddings via a gradient-based mask filter before decoding. To address feature sparsity in complex small targets, we integrate Linear-Snake Convolution to enrich retrieval-relevant representations. The gradient-based heatmap supervision is used during training only, incurring no additional gradient computation at inference. As a result, our design reduces decoder layers from eight to three and achieves a 59.4% parameter reduction (66.3M vs. 163M) while maintaining consistent accuracy gains under a reduced compute budget across benchmarks. Code Repository: https://github.com/yidimopozhibai/Noise-Suppressed-Query-Retrieval

langchain-anthropic==1.4.1

LangChain Releases

Stop burning tokens on DOM noise: a Playwright MCP optimizer layer

Dev.to

Talk to Your Favorite Game Characters! Mantella Brings AI to Skyrim and Fallout 4 NPCs

Dev.to

OpenAI Codex Update Adds macOS Agent, Browser, Memory; 3M Weekly Users

Dev.to

How Data Science Is Used to Predict User BeReducing Human Error in Compliance With AI Technology havior

Dev.to

Learning Where to Embed: Noise-Aware Positional Embedding for Query Retrieval in Small-Object Detection

Key Points

Abstract

Related Articles

langchain-anthropic==1.4.1

Stop burning tokens on DOM noise: a Playwright MCP optimizer layer

Talk to Your Favorite Game Characters! Mantella Brings AI to Skyrim and Fallout 4 NPCs

OpenAI Codex Update Adds macOS Agent, Browser, Memory; 3M Weekly Users

How Data Science Is Used to Predict User BeReducing Human Error in Compliance With AI Technology havior

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer