Generative Augmentation of Imbalanced Flight Records for Flight Diversion Prediction: A Multi-objective Optimisation Framework

arXiv cs.LG / 4/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The study tackles the challenge of predicting rare, high-impact flight diversions by augmenting scarce historical records with synthetic diversion data generated by deep generative models.
  • It evaluates and optimizes three tabular generative approaches—TVAE, CTGAN, and CopulaGAN—using a multi-objective optimization framework with automated hyperparameter search, while using a Gaussian Copula model as a statistical baseline.
  • A six-stage evaluation process assesses the synthetic data for realism, diversity, operational validity, statistical similarity, fidelity, and whether it improves downstream predictive performance.
  • Experimental results indicate that hyperparameter-optimized generative models both outperform their non-optimized versions and meaningfully improve diversion prediction versus training only on real data.
  • Overall, the work demonstrates a practical method for advancing ML prediction of rare aviation events through quality-controlled generative augmentation.

Abstract

Flight diversions are rare but high-impact events in aviation, making their reliable prediction vital for both safety and operational efficiency. However, their scarcity in historical records impedes the training of machine learning models utilised to predict them. This study addresses this scarcity gap by investigating how generative models can augment historical flight data with synthetic diversion records to enhance model training and improve predictive accuracy. We propose a multi-objective optimisation framework coupled with automated hyperparameter search to identify optimal configurations for three deep generative models: Tabular Variational Autoencoder (TVAE), Conditional Tabular Generative Adversarial Network (CTGAN), and CopulaGAN, with the Gaussian Copula (GC) model serving as a statistical baseline. The quality of the synthetic data was examined through a six-stage evaluation framework encompassing realism, diversity, operational validity, statistical similarity, fidelity, and predictive utility. Results show that the optimised models significantly outperform their non-optimised counterparts, and that synthetic augmentation substantially improves diversion prediction compared to models trained solely on real data. These findings demonstrate the effectiveness of hyperparameter-optimised generative models for advancing predictive modelling of rare events in air transportation.