SCT-MOT: Enhancing Air-to-Air Multiple UAVs Tracking with Swarm-Coupled Motion and Trajectory Guidance

arXiv cs.CV / 4/9/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper highlights that tracking swarms of small UAVs in air-to-air settings is difficult due to nonlinear coupled group motion and weak visual cues, which leads to detection failures, fragmented tracks, and identity switches.
  • It proposes SCT-MOT, combining swarm-level motion prediction (SMTP) with trajectory-guided spatio-temporal feature fusion (TG-STFF) to better model dependencies between UAVs and improve temporal consistency.
  • SMTP jointly models historical trajectories and posture-aware appearance features from a swarm perspective to produce more accurate forecasts of coupled group trajectories.
  • TG-STFF aligns predicted positions with historical visual cues and fuses them with current-frame features to strengthen spatio-temporal discrimination for weak targets.
  • Experiments on AIRMOT, MOT-FLY, and UAVSwarm show that SCT-MOT improves trajectory forecasting and delivers a reported 1.21% IDF1 gain over a prior EqMotion-based trajectory module within the same MOT framework, with better overall robustness across complex scenarios.

Abstract

Air-to-air tracking of swarm UAVs presents significant challenges due to the complex nonlinear group motion and weak visual cues for small objects, which often cause detection failures, trajectory fragmentation, and identity switches. Although existing methods have attempted to improve performance by incorporating trajectory prediction, they model each object independently, neglecting the swarm-level motion dependencies. Their limited integration between motion prediction and appearance representation also weakens the spatio-temporal consistency required for tracking in visually ambiguous and cluttered environments, making it difficult to maintain coherent trajectories and reliable associations. To address these challenges, we propose SCT-MOT, a tracking framework that integrates Swarm-Coupled motion modeling and Trajectory-guided feature fusion. First, we develop a Swarm Motion-Aware Trajectory Prediction (SMTP) module jointly models historical trajectories and posture-aware appearance features from a swarm-level perspective, enabling more accurate forecasting of the nonlinear, coupled group trajectories. Second, we design a Trajectory-Guided Spatio-Temporal Feature Fusion (TG-STFF) module aligns predicted positions with historical visual cues and deeply integrates them with current frame features, enhancing temporal consistency and spatial discriminability for weak objects. Extensive experiments on three public air-to-air swarm UAV tracking datasets, including AIRMOT, MOT-FLY, and UAVSwarm, demonstrate that SMTP achieves more accurate trajectory forecasts and yields a 1.21\% IDF1 improvement over the state-of-the-art trajectory prediction module EqMotion when integrated into the same MOT framework. Overall, our SCT-MOT consistently achieves superior accuracy and robustness compared to state-of-the-art trackers across multiple metrics under complex swarm scenarios.