MicroVision: An Open Dataset and Benchmark Models for Detecting Vulnerable Road Users and Micromobility Vehicles

arXiv cs.CV / 3/20/2026

📰 NewsTools & Practical UsageModels & Research

共有:

Key Points

The MicroVision project introduces an open image dataset and annotations for detecting VRUs and micromobility vehicles from a VRU perspective, addressing gaps in existing datasets by distinguishing VRUs from MMVs rather than labeling them as a single 'person'.
The data were collected in Gothenburg, Sweden, comprising over 8,000 anonymized full-HD images with more than 30,000 labeled VRUs and MMVs across nearly 2,000 interaction scenes gathered over a year.
Baseline benchmark object-detection models based on state-of-the-art architectures achieve a mean average precision of up to 0.723 on an unseen test set.
The dataset and model weights are publicly available via the provided DOI, enabling researchers and practitioners to train and evaluate VRU and MMV detection systems for traffic safety and monitoring.

Abstract

Micromobility is a growing mode of transportation, raising new challenges for traffic safety and planning due to increased interactions in areas where vulnerable road users (VRUs) share the infrastructure with micromobility, including parked micromobility vehicles (MMVs). Approaches to support traffic safety and planning increasingly rely on detecting road users in images -- a computer-vision task relying heavily on the quality of the images to train on. However, existing open image datasets for training such models lack focus and diversity in VRUs and MMVs, for instance, by categorizing both pedestrians and MMV riders as "person", or by not including new MMVs like e-scooters. Furthermore, datasets are often captured from a car perspective and lack data from areas where only VRUs travel (sidewalks, cycle paths). To help close this gap, we introduce the MicroVision dataset: an open image dataset and annotations for training and evaluating models for detecting the most common VRUs (pedestrians, cyclists, e-scooterists) and stationary MMVs (bicycles, e-scooters), from a VRU perspective. The dataset, recorded in Gothenburg (Sweden), consists of more than 8,000 anonymized, full-HD images with more than 30,000 carefully annotated VRUs and MMVs, captured over an entire year and part of almost 2,000 unique interaction scenes. Along with the dataset, we provide first benchmark object-detection models based on state-of-the-art architectures, which achieved a mean average precision of up to 0.723 on an unseen test set. The dataset and model can support traffic safety to distinguish between different VRUs and MMVs, or help monitoring systems identify the use of micromobility. The dataset and model weights can be accessed at https://doi.org/10.71870/eepz-jd52.