LPLCv2: An Expanded Dataset for Fine-Grained License Plate Legibility Classification

arXiv cs.CV / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces LPLCv2, an expanded and corrected benchmark dataset for fine-grained Automatic License Plate Legibility Classification, growing the original dataset to 3x+ size with additional capture days and revised annotations plus new label types.
  • LPLCv2 adds multi-level supervision, including license-plate bounding boxes, transcribed text, and legibility levels, as well as vehicle-level make/model/type/color and rich image-level metadata such as camera identity, capture conditions, acquisition time, and day ID.
  • The authors propose a novel training procedure using an Exponential Moving Average-based loss and a refined learning-rate scheduler aimed at reducing common testing-time errors.
  • Using these improvements, a baseline model reportedly reaches an 89.5% F1-score on the test set, outperforming the previous state of the art.
  • A new evaluation protocol explicitly mitigates potential camera contamination between training and evaluation splits, and the resulting performance impact is reported as small; the dataset and code are released publicly on GitHub.

Abstract

Modern Automatic License Plate Recognition (ALPR) systems achieve outstanding performance in controlled, well-defined scenarios. However, large-scale real-world usage remains challenging due to low-quality imaging devices, compression artifacts, and suboptimal camera installation. Identifying illegible license plates (LPs) has recently become feasible through a dedicated benchmark; however, its impact has been limited by its small size and annotation errors. In this work, we expand the original benchmark to over three times the size with two extra capture days, revise its annotations and introduce novel labels. LP-level annotations include bounding boxes, text, and legibility level, while vehicle-level annotations comprise make, model, type, and color. Image-level annotations feature camera identity, capture conditions (e.g., rain and faulty cameras), acquisition time, and day ID. We present a novel training procedure featuring an Exponential Moving Average-based loss function and a refined learning rate scheduler, addressing common mistakes in testing. These improvements enable a baseline model to achieve an 89.5% F1-score on the test set, considerably surpassing the previous state of the art. We further introduce a novel protocol to explicitly addresses camera contamination between training and evaluation splits, where results show a small impact. Dataset and code are publicly available at https://github.com/lmlwojcik/LPLCv2-Dataset.