GHOST: Ground-projected Hypotheses from Observed Structure-from-Motion Trajectories

arXiv cs.RO / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces “GHOST,” a scalable self-supervised method that segments feasible vehicle trajectories from monocular images using dashcam-derived ego-motion as implicit supervision.
  • It recovers camera motion via monocular structure-from-motion, projects trajectories onto the ground plane to create spatial masks of traversed regions without manual annotation, and then trains a deep segmentation network on these auto-generated labels.
  • At inference, the network predicts motion-conditioned path proposals from a single RGB image, avoiding explicit reliance on road or lane markings while learning scene layout, lane topology, and intersection structure from diverse internet data.
  • Experiments on NuScenes show reliable trajectory prediction, and the method can transfer to an electric scooter platform with light fine-tuning.
  • The authors argue that large-scale ego-motion distillation enables more general “trajectory hypothesis estimation” beyond the demonstrated trajectories through image segmentation.

Abstract

We present a scalable self-supervised approach for segmenting feasible vehicle trajectories from monocular images for autonomous driving in complex urban environments. Leveraging large-scale dashcam videos, we treat recorded ego-vehicle motion as implicit supervision and recover camera trajectories via monocular structure-from-motion, projecting them onto the ground plane to generate spatial masks of traversed regions without manual annotation. These automatically generated labels are used to train a deep segmentation network that predicts motion-conditioned path proposals from a single RGB image at run time, without explicit modeling of road or lane markings. Trained on diverse, unconstrained internet data, the model implicitly captures scene layout, lane topology, and intersection structure, and generalizes across varying camera configurations. We evaluate our approach on NuScenes, demonstrating reliable trajectory prediction, and further show transfer to an electric scooter platform through light fine-tuning. Our results indicate that large-scale ego-motion distillation yields structured and generalizable path proposals beyond the demonstrated trajectory, enabling trajectory hypothesis estimation via image segmentation.