Using Synthetic Data for Machine Learning-based Childhood Vaccination Prediction in Narok, Kenya
arXiv cs.LG / 4/13/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The study addresses limited, high-quality vaccination data in Narok County, Kenya, where nomadic Maasai communities are at higher risk of missing childhood vaccine doses.
- Researchers digitized 8 years of MOH 510 registry records (n=6,913) and used machine learning (Logistic Regression and XGBoost) to predict children at risk of missing key vaccines.
- The work introduces a privacy-preserving approach using tabular diffusion-based synthetic data generation (TabSyn) to train models without exposing sensitive patient-level information.
- Model performance for some vaccine predictions is reported to achieve recall, precision, and F1-scores above 90%, and training on synthetic data preserved predictive accuracy relative to real-data training.
- The authors conclude that synthetic-data-enabled forecasting can support scalable, privacy-preserving immunization planning in low-infrastructure clinical settings.
Related Articles

Black Hat Asia
AI Business

Apple is building smart glasses without a display to serve as an AI wearable
THE DECODER

Why Fashion Trend Prediction Isn’t Enough Without Generative AI
Dev.to
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
Chatbot vs Voicebot: The Real Business Decision Nobody Talks About
Dev.to