GateMOT: Q-Gated Attention for Dense Object Tracking

arXiv cs.CV / 4/30/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Dense object tracking struggles to use standard attention because quadratic all-to-all interactions are too expensive for high-resolution motion estimation.
  • GateMOT introduces Q-Gated Attention, turning the Query into a learnable gating unit (Gating-Q) that probabilistically modulates Key features element-wise to select relevance without costly global aggregation.
  • Using parallel Q-Attention heads over a shared feature map, GateMOT produces consistent, task-specific representations for detection, motion estimation, and re-identification in a coupled multi-task decoder.
  • The method reports state-of-the-art results on BEE24 (HOTA 48.4, MOTA 67.8, IDF1 64.5) and performs strongly on other dense object tracking benchmarks, suggesting Q-Attention is transferable to similar dense tracking settings.

Abstract

While large models demonstrate the strong representational power of vanilla attention, this core mechanism cannot be directly applied to Dense Object Tracking: its quadratic all-to-all interactions are computationally prohibitive for dense motion estimation on high-resolution features. This mismatch prevents Dense Object Tracking from fully leveraging attention-based modeling in crowded and occlusion-heavy scenes. To address this challenge, we introduce GateMOT, an online tracking framework centered on Q-Gated Attention (Q-Attention), an efficient and spatially aware attention variant. Our key idea is to repurpose the Query from a similarity-conditioning term into a learnable gating unit. This Gating-Query (Gating-Q) produces a probabilistic gate that modulates Key features in an element-wise manner, enabling explicit relevance selection instead of costly global aggregation. Built on this mechanism, parallel Q-Attention heads transform one shared feature map into task-specific yet consistent representations for detection, motion, and re-identification, yielding a tightly coupled multi-task decoder with linear-complexity gating operations. GateMOT achieves state-of-the-art HOTA of 48.4, MOTA of 67.8, and IDF1 of 64.5 on BEE24, and demonstrates strong performance on additional Dense Object Tracking benchmarks. These results show that Q-Attention is a simple, effective, and transferable building block for attention-based tracking in dense tracking scenarios.