Recurrence-Based Nonlinear Vocal Dynamics as Digital Biomarkers for Depression Detection from Conversational Speech

arXiv cs.LG / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The study proposes that depression may be detectable from conversation speech by analyzing nonlinear temporal “recurrence” patterns in vocal dynamics rather than relying on static acoustic descriptors.
  • Researchers modeled frame-level COVAREP vocal trajectories as nonlinear dynamical systems and computed recurrence-based digital biomarkers from 74 vocal channels in the DAIC-WOZ depression subset (142 labeled participants).
  • Logistic regression with feature selection and stratified cross-validation showed recurrence-based biomarkers outperformed several static and alternative nonlinear feature baselines, achieving a mean cross-validated AUC of 0.689.
  • Statistical testing supported the result’s significance (permutation test p=0.004), and pooled predictions produced an AUC of 0.665 with a 95% bootstrap confidence interval of [0.568, 0.758].
  • Overall, the findings suggest recurrence-structure analysis and nonlinear state-space modeling are promising directions for digital psychiatric biomarkers.

Abstract

Digital biomarkers for depression have largely relied on static acoustic descriptors, pooled summary statistics, or conventional machine learning representations. Such approaches may miss nonlinear temporal organization embedded in conversational vocal dynamics. We hypothesized that depression is associated with altered recurrence structure in vocal state trajectories, reflecting changes in how the vocal system revisits acoustic states over time. Using the depression subset of the DAIC-WOZ corpus with 142 labeled participants, we modeled frame-level COVAREP trajectories as nonlinear dynamical systems and derived recurrence-based biomarkers from 74 vocal channels. Logistic regression with feature selection and stratified cross-validation evaluated classification performance. Recurrence-based biomarkers achieved a mean cross-validated AUC of 0.689, exceeding static acoustic baselines, entropy-dynamics features, Hurst exponent features, determinism features, and Lyapunov-like instability proxies. Permutation testing indicated statistical significance with p=0.004. Pooled cross-validated predictions yielded AUC 0.665 with a 95\% bootstrap confidence interval of [0.568, 0.758]. These findings suggest that depression may be characterized by altered recurrence structure in conversational vocal dynamics and support nonlinear state-space analysis as a promising direction for digital psychiatric biomarkers.