AI Navigate

An Optimised Greedy-Weighted Ensemble Framework for Financial Loan Default Prediction

arXiv cs.LG / 3/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The study introduces an Optimised Greedy-Weighted Ensemble framework for loan default prediction that adaptively assigns model weights based on empirical predictive performance.
  • It combines multiple machine learning classifiers with hyperparameters optimised via Particle Swarm Optimisation, and merges their outputs using a regularised greedy weighting scheme.
  • A neural-network-based meta-learner is employed within a stacked ensemble to capture higher-order relationships among model predictions.
  • On the Lending Club dataset, the BlendNet ensemble achieves an AUC of 0.80, a macro F1-score of 0.73, and a default recall of 0.81, with calibration analysis showing tree-based ensembles provide reliable probability estimates while the stacked ensemble offers strong ranking.
  • Recursive Feature Elimination identifies revolving utilisation, annual income, and debt-to-income ratio as top predictors of loan default, illustrating interpretable, performance-driven credit risk modeling.

Abstract

Accurate prediction of loan defaults is a central challenge in credit risk management, particularly in modern financial datasets characterised by nonlinear relationships, class imbalance, and evolving borrower behaviour. Traditional statistical models and static ensemble methods often struggle to maintain reliable performance under such conditions. This study proposes an Optimised Greedy-Weighted Ensemble framework for loan default prediction that dynamically allocates model weights based on empirical predictive performance. The framework integrates multiple machine learning classifiers, with their hyperparameters first optimised using Particle Swarm Optimisation. Model predictions are then combined via a regularised greedy weighting mechanism. At the same time, a neural-network-based meta-learner is employed within stacked-ensemble to capture higher-order relationships among model outputs. Experiments conducted on the Lending Club dataset demonstrate that the proposed framework improves predictive performance compared with individual classifiers. The BlendNet ensemble achieved the strongest results with an AUC of 0.80, a macro-average F1-score of 0.73, and a default recall of 0.81. Calibration analysis further shows that tree-based ensembles such as Extra Trees and Gradient Boosting provide the most reliable probability estimates, while the stacked ensemble offers superior ranking capability. Feature analysis using Recursive Feature Elimination identifies revolving utilisation, annual income, and debt-to-income ratio as the most influential predictors of loan default. These findings demonstrate that performance-driven ensemble weighting can improve both predictive accuracy and interpretability in credit risk modelling. The proposed framework provides a scalable data-driven approach to support institutional credit assessment, risk monitoring, and financial decision-making.