Learning from Change: Predictive Models for Incident Prevention in a Regulated IT Environment

arXiv cs.AI / 4/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses how predictive incident risk scoring can improve IT change management in highly regulated environments like finance, where reliability, auditability, and explainability are required.
  • It proposes an ML-based approach that assists engineers in assessing and planning change deployments by estimating the likelihood that a change will induce incidents.
  • To meet regulatory constraints, the model is designed for interpretability and traceability using SHAP feature-level explanations so that decisions can be audited.
  • Using a one-year real-world dataset from a large international bank, the authors compare rule-based assessments with three ML models (HGBC, LightGBM, XGBoost), finding LightGBM performs best.
  • The study also shows that adding aggregated team/organizational metrics improves predictive performance, suggesting that organizational context can enhance risk forecasting beyond purely technical features.

Abstract

Effective IT change management is important for businesses that depend on software and services, particularly in highly regulated sectors such as finance, where operational reliability, auditability, and explainability are essential. A significant portion of IT incidents are caused by changes, making it important to identify high-risk changes before deployment. This study presents a predictive incident risk scoring approach at a large international bank. The approach supports engineers during the assessment and planning phases of change deployments by predicting the potential of inducing incidents. To satisfy regulatory constraints, we built the model with auditability and explainability in mind, applying SHAP values to provide feature-level insights and ensure decisions are traceable and transparent. Using a one-year real-world dataset, we compare the existing rule-based process with three machine learning models: HGBC, LightGBM, and XGBoost. LightGBM achieved the best performance, particularly when enriched with aggregated team metrics that capture organisational context. Our results show that data-driven, interpretable models can outperform rule-based approaches while meeting compliance needs, enabling proactive risk mitigation and more reliable IT operations.