ParkingScenes: A Structured Dataset for End-to-End Autonomous Parking in Simulation Scenes

arXiv cs.CV / 4/28/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • ParkingScenes is a new multimodal, structured dataset for end-to-end autonomous parking in simulation, created to address the lack of high-quality parking-specific data.
  • The dataset is built on CARLA and uses a Hybrid A* planner plus an MPC to generate structured parking trajectories that provide accurate, reproducible supervision.
  • It contains 16 reverse-in and 6 parallel parking scenarios, each under two pedestrian conditions (present/absent), totaling 704 structured episodes and about 105,000 frames, with consistent scenario repetition.
  • Each frame includes synchronized inputs from four RGB cameras, four depth sensors, vehicle motion states, and BEV representations, supporting rich multimodal fusion.
  • Experiments comparing models trained on ParkingScenes versus unstructured manually collected simulation data show significant performance gains, and the dataset plus collection framework are planned for public release.

Abstract

Autonomous parking remains a critical yet challenging task in intelligent driving systems, particularly within constrained urban environments where maneuvering space is limited and precise control is essential. While recent advances in end-to-end learning have shown great promise, the lack of high-quality, structured datasets tailored for parking scenarios remains a significant bottleneck.To address this gap, we present ParkingScenes, a comprehensive multimodal dataset specifically designed for end-to-end autonomous parking in simulated scenes. Built on the CARLA simulator, ParkingScenes features structured parking trajectories generated by a Hybrid A* planner and a Model Predictive Controller (MPC), providing accurate and reproducible supervision signals. The dataset includes 16 reverse-in and 6 parallel parking scenarios, each executed under two pedestrian conditions (present and absent), resulting in 704 structured episodes and approximately 105000 frames. Each scenario is repeated 16 times to ensure consistent coverage. Each frame contains synchronized data from four RGB cameras, four depth sensors, vehicle motion states, and Bird's-Eye View (BEV) representations, enabling rich multimodal fusion and context-aware learning. To demonstrate the utility of our dataset, we compare models trained on ParkingScenes with those trained on unstructured, manually collected simulation data under identical conditions. Results show significant improvements in performance, underscoring the effectiveness of structured supervision for robust and accurate parking policy learning. By releasing both the dataset and the collection framework, ParkingScenes establishes a scalable and reproducible benchmark for advancing learning-based autonomous parking systems. The dataset and collection framework will be released at: https://github.com/haonan-ai/ParkingScenes