LEO: Graph Attention Network based Hybrid Multi Sensor Extended Object Fusion and Tracking for Autonomous Driving Applications

arXiv cs.LG / 4/3/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces LEO (Learned Extension of Objects), a spatio-temporal Graph Attention Network designed for hybrid multi-sensor fusion to estimate both the shape and trajectory of dynamic extended objects in autonomous driving.
  • It aims to combine Bayesian extended-object model robustness with deep learning adaptability by learning fusion weights and enforcing temporal consistency while representing multi-scale, complex geometries (including articulated vehicles).
  • LEO uses a task-specific parallelogram ground-truth formulation to train on challenging object forms without requiring the dense annotations typically demanded by alternative deep approaches.
  • The method is evaluated for real-time efficiency on the Mercedes-Benz DRIVE PILOT (SAE L3) dataset and is further validated on public datasets like View of Delft (VoD) to demonstrate cross-dataset generalization.
  • The approach is reported to generalize across sensor types, configurations, object classes, and regions, remaining robust for long-range targets.

Abstract

Accurate shape and trajectory estimation of dynamic objects is essential for reliable automated driving. Classical Bayesian extended-object models offer theoretical robustness and efficiency but depend on completeness of a-priori and update-likelihood functions, while deep learning methods bring adaptability at the cost of dense annotations and high compute. We bridge these strengths with LEO (Learned Extension of Objects), a spatio-temporal Graph Attention Network that fuses multi-modal production-grade sensor tracks to learn adaptive fusion weights, ensure temporal consistency, and represent multi-scale shapes. Using a task-specific parallelogram ground-truth formulation, LEO models complex geometries (e.g. articulated trucks and trailers) and generalizes across sensor types, configurations, object classes, and regions, remaining robust for challenging and long-range targets. Evaluations on the Mercedes-Benz DRIVE PILOT SAE L3 dataset demonstrate real-time computational efficiency suitable for production systems; additional validation on public datasets such as View of Delft (VoD) further confirms cross-dataset generalization.