AI Navigate

ModTrack: Sensor-Agnostic Multi-View Tracking via Identity-Informed PHD Filtering with Covariance Propagation

arXiv cs.CV / 3/18/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • ModTrack proposes a modular MV-MOT pipeline that confines learning to the detection and feature extraction stage, while keeping fusion, association, and tracking in closed-form analytical methods.
  • The method converts each sensor's output into calibrated position-covariance pairs and uses cross-view clustering with precision-weighted fusion to produce unified estimates for identity assignment and temporal tracking with quantified uncertainty.
  • It employs a feedback-coupled, identity-informed GM-PHD filter with HMM motion modes to robustly maintain identities under missed detections and heavy occlusion.
  • ModTrack achieves 95.5 IDF1 and 91.4 MOTA on WildTrack, outperforming prior modular methods by over 21 points and approaching end-to-end methods, with transferability to MultiviewX and RadarScenes via perception-module replacement.

Abstract

Multi-View Multi-Object Tracking (MV-MOT) aims to localize and maintain consistent identities of objects observed by multiple sensors. This task is challenging, as viewpoint changes and occlusion disrupt identity consistency across views and time. Recent end-to-end approaches address this by jointly learning 2D Bird's Eye View (BEV) representations and identity associations, achieving high tracking accuracy. However, these methods offer no principled uncertainty accounting and remain tightly coupled to their training configuration, limiting generalization across sensor layouts, modalities, or datasets without retraining. We propose ModTrack, a modular MV-MOT system that matches end-to-end performance while providing cross-modal, sensor-agnostic generalization and traceable uncertainty. ModTrack confines learning methods to just the \textit{Detection and Feature Extraction} stage of the MV-MOT pipeline, performing all fusion, association, and tracking with closed-form analytical methods. Our design reduces each sensor's output to calibrated position-covariance pairs (\mathbf{z}, R); cross-view clustering and precision-weighted fusion then yield unified estimates (\hat{\mathbf{z}}, \hat{R}) for identity assignment and temporal tracking. A feedback-coupled, identity-informed Gaussian Mixture Probability Hypothesis Density (GM-PHD) filter with HMM motion modes uses these fused estimates to maintain identities under missed detections and heavy occlusion. ModTrack achieves 95.5 IDF1 and 91.4 MOTA on \textit{WildTrack}, surpassing all prior modular methods by over 21 points and rivaling the state-of-the-art end-to-end methods while providing deployment flexibility they cannot. Specifically, the same tracker core transfers unchanged to \textit{MultiviewX} and \textit{RadarScenes}, with only perception-module replacement required to extend to new domains and sensor modalities.