Modeling Parkinson's Disease Progression Using Longitudinal Voice Biomarkers: A Comparative Study of Statistical and Neural Mixed-Effects Models

arXiv stat.ML / 4/20/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses how to predict Parkinson’s disease progression from longitudinal voice biomarkers collected via telemonitoring, where within-subject correlations and complex patient trajectories complicate analysis.
  • It applies a Neural Mixed Effects (NME) modeling framework to the Oxford Parkinson’s telemonitoring voice dataset and compares it against Generalized Neural Network Mixed Models (GNMM) and semi-parametric Generalized Additive Mixed Models (GAMMs).
  • In small-sample clinical settings, the study finds that neural architectures are highly susceptible to overfitting, leading to much worse predictive performance than expected.
  • GAMMs achieve the best trade-off between accuracy and interpretability, with reported predictive error (MSE) of 6.56 versus neural baselines with MSE above 90.
  • The authors conclude that for deployable telemonitoring systems under data scarcity, stronger classical mixed-effects approaches (and/or larger diverse datasets for neural validation) are essential.

Abstract

Predicting Parkinson's Disease (PD) progression is crucial for personalized treatment, and voice biomarkers offer a promising non-invasive method for tracking symptom severity through telemonitoring. However, analyzing this longitudinal data is challenging due to inherent within-subject correlations, the small sample sizes typical of clinical trials, and complex patient-specific progression patterns. While deep learning offers high theoretical flexibility, its application to small-cohort longitudinal studies remains under-explored compared to traditional statistical methods. This study presents an application of the Neural Mixed Effects (NME) framework to Parkinson's telemonitoring, benchmarking it against Generalized Neural Network Mixed Models (GNMM) and semi-parametric statistical baseline of Generalized Additive Mixed Models (GAMMs). Using the Oxford Parkinson's telemonitoring voice dataset (), we demonstrate that while neural architectures offer flexibility, they are prone to significant overfitting in small-sample regimes. Our results indicate that GAMMs provide the optimal balance, achieving superior predictive accuracy (MSE 6.56) compared to neural baselines (MSE > 90) while maintaining clinical interpretability. We discuss the critical implications of these findings for developing robust, deployable telemonitoring systems where data scarcity is a constraint, highlighting the necessity for larger, diverse datasets for neural model validation.