Towards Successful Implementation of Automated Raveling Detection: Effects of Training Data Size, Illumination Difference, and Spatial Shift

arXiv cs.CV / 4/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses why ML/deep-learning raveling (aggregate loss) detectors degrade in real-world large-scale deployments when inference data differ by run, sensor, and environment.
  • It studies how robustness is affected by three controlled factors—training data size, illumination differences, and spatial shifts—using variation-controlled experimentation.
  • The authors introduce RavelingArena, a benchmark built by augmenting an existing dataset with diverse, controlled variations to quantify each factor’s impact on performance.
  • Experiments show that both increasing and diversifying training data substantially improve accuracy, yielding at least a 9.2% gain under the most diverse conditions.
  • A case study on multi-year highway testing in Georgia demonstrates improved year-to-year consistency, supporting future work on temporal deterioration modeling.

Abstract

Raveling, the loss of aggregates, is a major form of asphalt pavement surface distress, especially on highways. While research has shown that machine learning and deep learning-based methods yield promising results for raveling detection by classification on range images, their performance often degrades in large-scale deployments where more diverse inference data may originate from different runs, sensors, and environmental conditions. This degradation highlights the need of a more generalizable and robust solution for real-world implementation. Thus, the objectives of this study are to 1) identify and assess potential variations that impact model robustness, such as the quantity of training data, illumination difference, and spatial shift; and 2) leverage findings to enhance model robustness under real-world conditions. To this end, we propose RavelingArena, a benchmark designed to evaluate model robustness to variations in raveling detection. Instead of collecting extensive new data, it is built by augmenting an existing dataset with diverse, controlled variations, thereby enabling variation-controlled experiments to quantify the impact of each variation. Results demonstrate that both the quantity and diversity of training data are critical to the accuracy of models, achieving at least a 9.2% gain in accuracy under the most diverse conditions in experiments. Additionally, a case study applying these findings to a multi-year test section in Georgia, U.S., shows significant improvements in year-to-year consistency, laying foundations for future studies on temporal deterioration modeling. These insights provide guidance for more reliable model deployment in raveling detection and other real-world tasks that require adaptability to diverse conditions.