Comparative Analysis of Polygon-Based and Global Machine Learning Models for Bus Occupancy Prediction

arXiv cs.LG / 5/4/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper tackles bus ridership forecasting errors caused by treating an entire city as one homogeneous region, and proposes a spatially localized modeling strategy.
  • It builds a framework that combines spatial clustering with multi-dimensional feature analysis using bus stop/route/time ridership data plus open sources like spatial attraction features, weather, and temporal patterns.
  • Urban areas are clustered so that nearby bus stops with similar ridership behaviors form groups, and a separate local forecasting model is trained per cluster.
  • The localized, spatially-aware approach achieves accuracy comparable to global machine learning models while enabling more targeted public-transport service improvements.
  • Overall, the study suggests that incorporating geographic context via clustering can improve prediction quality for transit demand forecasting.

Abstract

Accurate forecasting of bus ridership (passengers numbers) is crucial for efficient management and optimization of public transport systems. Traditional forecasting models often fail to capture the unique and localized dynamics of different urban areas by treating the entire city as a single, homogeneous region. This paper introduces a novel framework that enhances bus ridership prediction by integrating a spatial clustering methodology with multi-dimensional feature analysis. The proposed framework utilizes a diverse set of data, including bus ridership data (by route number, time, and bus stop) complemented by a variety of open source data, such as spatial features (e.g., attractive destinations), meteorological conditions (e.g., temperature, rainfall), and temporal patterns (e.g., time of day, day of week). By clustering the urban area into distinct regions, based on the principle that bus stops in close proximity share similar ridership characteristics, a separate local forecasting model is trained for each of these clusters. This localized approach demonstrates an accuracy comparable to that of global models. The findings suggest that a spatially-aware, localized modeling strategy is effective for public transport prediction, paving the way for more targeted and efficient service improvements.

Comparative Analysis of Polygon-Based and Global Machine Learning Models for Bus Occupancy Prediction | AI Navigate