S3KF: Spherical State-Space Kalman Filtering for Panoramic 3D Multi-Object Tracking

arXiv cs.RO / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes S3KF, a panoramic 3D multi-object tracking framework designed for wide-area safety monitoring and robotic perception under strong panoramic distortion and occlusion.
  • It models object bearing with a geometry-consistent unit-sphere state representation (tangent-plane parameterization) and jointly estimates bearing along with box scale and depth dynamics using an extended spherical Kalman filtering pipeline.
  • S3KF fuses detections from a quad-fisheye camera rig with depth observations from a motorized rotating LiDAR to maintain stable target association in dynamic scenes.
  • The authors introduce a map-based ground-truth generation workflow using wearable localization devices aligned to a shared global LiDAR map, avoiding the need for motion-capture infrastructure.
  • Experiments on self-collected sequences report decimeter-level planar tracking accuracy, improved identity continuity versus a 2D panoramic baseline, and real-time onboard performance on a Jetson AGX Orin.

Abstract

Panoramic multi-object tracking is important for industrial safety monitoring, wide-area robotic perception, and infrastructure-light deployment in large workspaces. In these settings, the sensing system must provide full-surround coverage, metric geometric cues, and stable target association under wide field-of-view distortion and occlusion. Existing image-plane trackers are tightly coupled to the camera projection and become unreliable in panoramic imagery, while conventional Euclidean 3D formulations introduce redundant directional parameters and do not naturally unify angular, scale, and depth estimation. In this paper, we present \mathbf{S^3KF}, a panoramic 3D multi-object tracking framework built on a motorized rotating LiDAR and a quad-fisheye camera rig. The key idea is a geometry-consistent state representation on the unit sphere \mathbb{S}^2, where object bearing is modeled by a two-degree-of-freedom tangent-plane parameterization and jointly estimated with box scale and depth dynamics. Based on this state, we derive an extended spherical Kalman filtering pipeline that fuses panoramic camera detections with LiDAR depth observations for multimodal tracking. We further establish a map-based ground-truth generation pipeline using wearable localization devices registered to a shared global LiDAR map, enabling quantitative evaluation without motion-capture infrastructure. Experiments on self-collected real-world sequences show decimeter-level planar tracking accuracy, improved identity continuity over a 2D panoramic baseline in dynamic scenes, and real-time onboard operation on a Jetson AGX Orin platform. These results indicate that the proposed framework is a practical solution for panoramic perception and industrial-scale multi-object tracking.The project page can be found at https://kafeiyin00.github.io/S3KF/.