Temporal Dropout Risk in Learning Analytics: A Harmonized Survival Benchmark Across Dynamic and Early-Window Representations

arXiv cs.AI / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a survival-oriented benchmark for predicting student dropout risk using OULAD, aiming to make comparisons consistent across different temporal modeling protocols.
  • It evaluates two harmonized arms—(1) a dynamic weekly/person-period representation and (2) a continuous-time arm spanning multiple survival model families, including tree-based, parametric, and neural approaches.
  • Performance is assessed through four layers—predictive accuracy, ablation, explainability, and calibration—while the authors caution against a single cross-arm ranking due to methodological differences.
  • In the continuous-time arm, Random Survival Forest performs best on discrimination and horizon-specific Brier scores, while in the dynamic weekly arm Poisson Piecewise-Exponential narrowly leads on integrated Brier score.
  • Explainability, ablation, and calibration collectively indicate that the strongest dropout signal is primarily temporal and behavioral rather than mainly driven by demographic or structural static attributes, with XGBoost AFT showing notable calibration bias.

Abstract

Student dropout is a persistent concern in Learning Analytics, yet comparative studies frequently evaluate predictive models under heterogeneous protocols, prioritizing discrimination over temporal interpretability and calibration. This study introduces a survival-oriented benchmark for temporal dropout risk modelling using the Open University Learning Analytics Dataset (OULAD). Two harmonized arms are compared: a dynamic weekly arm, with models in person-period representation, and a comparable continuous-time arm, with an expanded roster of families -- tree-based survival, parametric, and neural models. The evaluation protocol integrates four analytical layers: predictive performance, ablation, explainability, and calibration. Results are reported within each arm separately, as a single cross-arm ranking is not methodologically warranted. Within the comparable arm, Random Survival Forest leads in discrimination and horizon-specific Brier scores; within the dynamic arm, Poisson Piecewise-Exponential leads narrowly on integrated Brier score within a tight five-family cluster. No-refit bootstrap sampling variability qualifies these positions as directional signals rather than absolute superiority. Ablation and explainability analyses converged, across all families, on a shared finding: the dominant predictive signal was not primarily demographic or structural, but temporal and behavioral. Calibration corroborated this pattern in the better-discriminating models, with the exception of XGBoost AFT, which exhibited systematic bias. These results support the value of a harmonized, multi-dimensional benchmark in Learning Analytics and situate dropout risk as a temporal-behavioral process rather than a function of static background attributes.