Time-series Meets Complex Motion Modeling: Robust and Computational-effective Motion Predictor for Multi-object Tracking

arXiv cs.CV / 5/4/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes TCMP (Temporal Convolutional Motion Predictor) for multi-object tracking by modeling complex, non-linear real-world motions such as sudden stops and sharp turns.
  • Instead of relying on increasingly complex and computationally heavy generative models, it argues that a purpose-built, efficient architecture can deliver better practical performance.
  • TCMP is built on a modified Temporal Convolutional Network with dilated convolutions plus a regression head, enabling effective motion prediction over arbitrary temporal context lengths.
  • Experiments report new state-of-the-art results, improving HOTA from 62.3% to 63.4%, IDF1 from 63.0% to 65.0%, and AssA from 47.2% to 49.1% versus the prior best method.
  • The approach matches its accuracy gains with major efficiency benefits, using only 0.014× the parameters and 0.05× the FLOPs of the SOTA method.

Abstract

Multi-object tracking (MOT) is critical in numerous real-world applications, including surveillance, autonomous driving, and robotics. Accurately predicting object motion is fundamental to MOT, but current methods struggle with the complexities of real-world, non-linear motion (e.g., sudden stops, sharp turns). While recent research has gravitated towards increasingly complex and computationally expensive generative models to tackle this problem, their practical utility is often constrained. This paper challenges that paradigm, arguing that such complexity is not only unnecessary but can be outperformed by a more efficient, purpose-built approach. We introduce the Temporal Convolutional Motion Predictor (TCMP), a novel framework for MOT that leverages a modified Temporal Convolutional Network (TCN) featuring dilated convolutions and a regression head. This design allows for effective motion prediction across arbitrary temporal context lengths. Experimental results demonstrate that our approach achieves state-of-the-art performance, specifically improves upon the previous best method in several key metrics: HOTA (a measure of overall tracking accuracy) increases from 62.3% to 63.4%, IDF1 (a measure of identity preservation) rises from 63.0% to 65.0%, and AssA (a measure of association accuracy) improves from 47.2% to 49.1%. Significantly, TCMP achieves this performance while being highly efficient; it has only 0.014 times the parameters and requires only 0.05 times the computational cost (FLOPs) compared to the SOTA method. while is only 0.014 times the size (in terms of parameters) and requires only 0.05 times the computational cost (in terms of FLOPs). These findings highlight the robustness of our method to advance MOT systems by ensuring adaptability, accuracy, and efficiency in complex tracking environments.