TFusionOcc: T-Primitive Based Object-Centric Multi-Sensor Fusion Framework for 3D Occupancy Prediction

arXiv cs.RO / 4/22/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper introduces TFusionOcc, an object-centric multi-sensor fusion framework for 3D semantic occupancy prediction aimed at improving autonomous-vehicle scene understanding.
  • It replaces limitations of voxel-based and Gaussian-primitives approaches by proposing T-primitive geometry primitives based on a Students t-distribution, including plain, T-Superquadric, and deformable T-Superquadric with inverse warping.
  • A unified probabilistic formulation is developed using a Students t-distribution and a T-mixture model (TMM) to jointly model both occupancy and semantic labels.
  • The method uses a tightly coupled multi-stage fusion architecture to integrate camera and LiDAR cues more effectively.
  • Experiments on nuScenes achieve state-of-the-art results, and evaluations on nuScenes-C indicate strong robustness to most corruption types, with code to be released on GitHub.

Abstract

The prediction of 3D semantic occupancy enables autonomous vehicles (AVs) to perceive the fine-grained geometric and semantic scene structure for safe navigation and decision-making. Existing methods mainly rely on either voxel-based representations, which incur redundant computation over empty regions, or on object-centric Gaussian primitives, which are limited in modeling complex, non-convex, and asymmetric structures. In this paper, we present TFusionOcc, a T-primitive-based object-centric multi-sensor fusion framework for 3D semantic occupancy prediction. Specifically, we introduce a family of Students t-distribution-based T-primitives, including the plain T-primitive, T-Superquadric, and deformable T-Superquadric with inverse warping, where the deformable T-Superquadric serves as the key geometry-enhancing primitive. We further develop a unified probabilistic formulation based on the Students t-distribution and the T-mixture model (TMM) to jointly model occupancy and semantics, and design a tightly coupled multi-stage fusion architecture to effectively integrate camera and LiDAR cues. Extensive experiments on nuScenes show state-of-the-art performance, while additional evaluations on nuScenes-C demonstrate strong robustness under most corruption scenarios. The code will be available at: https://github.com/DanielMing123/TFusionOcc