Traffic Sign Recognition in Autonomous Driving: Dataset, Benchmark, and Field Experiment

arXiv cs.CV / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces TS-1M, a large-scale, globally diverse traffic sign dataset with over one million real-world images across 454 standardized categories, aimed at improving real-world diagnostic evaluation for traffic sign recognition (TSR).
  • It proposes a diagnostic benchmark with challenge-oriented settings—such as cross-region recognition, rare-class identification, low-clarity robustness, and semantic text understanding—to reveal where different TSR approaches break down.
  • The authors evaluate TS-1M across three learning paradigms—classical supervised models, self-supervised pretrained models, and multimodal vision-language models (VLMs)—and find paradigm-dependent performance patterns.
  • Their analysis suggests semantic alignment is critical for cross-region generalization and rare-category recognition, while purely visual models are more vulnerable to appearance shifts and data imbalance.
  • The work validates TS-1M’s practical relevance via real-scene autonomous driving experiments that combine TSR with semantic reasoning and spatial localization for map-level decision constraints.

Abstract

Traffic Sign Recognition (TSR) is a core perception capability for autonomous driving, where robustness to cross-region variation, long-tailed categories, and semantic ambiguity is essential for reliable real-world deployment. Despite steady progress in recognition accuracy, existing traffic sign datasets and benchmarks offer limited diagnostic insight into how different modeling paradigms behave under these practical challenges. We present TS-1M, a large-scale and globally diverse traffic sign dataset comprising over one million real-world images across 454 standardized categories, together with a diagnostic benchmark designed to analyze model capability boundaries. Beyond standard train-test evaluation, we provide a suite of challenge-oriented settings, including cross-region recognition, rare-class identification, low-clarity robustness, and semantic text understanding, enabling systematic and fine-grained assessment of modern TSR models. Using TS-1M, we conduct a unified benchmark across three representative learning paradigms: classical supervised models, self-supervised pretrained models, and multimodal vision-language models (VLMs). Our analysis reveals consistent paradigm-dependent behaviors, showing that semantic alignment is a key factor for cross-region generalization and rare-category recognition, while purely visual models remain sensitive to appearance shift and data imbalance. Finally, we validate the practical relevance of TS-1M through real-scene autonomous driving experiments, where traffic sign recognition is integrated with semantic reasoning and spatial localization to support map-level decision constraints. Overall, TS-1M establishes a reference-level diagnostic benchmark for TSR and provides principled insights into robust and semantic-aware traffic sign perception. Project page: https://guoyangzhao.github.io/projects/ts1m.