Phonological Subspace Collapse Is Aetiology-Specific and Cross-Lingually Stable: Evidence from 3,374 Speakers

arXiv cs.CL / 4/24/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • The study scales a training-free dysarthria severity assessment method to 3,374 speakers across 12 languages and 5 aetiologies, using frozen self-supervised speech representations and d-prime separability of phonological feature subspaces.
  • It finds that degradation profiles are distinguishable by aetiology at the group level, with most phonological features showing large effect sizes, though individual-level classification performance remains modest.
  • The resulting consonant d-prime profile shapes are highly stable across languages for each aetiology (cosine similarity > 0.95), enabling language-independent phenotyping of impairment patterns while requiring within-corpus calibration for absolute severity.
  • The approach is robust across different SSL architectures (6 backbones), showing monotonic severity gradients and strong inter-model agreement, and it remains valid even under fixed-token d-prime estimation.

Abstract

We previously introduced a training-free method for dysarthria severity assessment based on d-prime separability of phonological feature subspaces in frozen self-supervised speech representations, validated on 890 speakers across 5 languages with HuBERT-base. Here, we scale the analysis to 3,374 speakers from 25 datasets spanning 12 languages and 5 aetiologies (Parkinson's disease, cerebral palsy, ALS, Down syndrome, and stroke), plus healthy controls, using 6 SSL backbones. We report three findings. First, aetiology-specific degradation profiles are distinguishable at the group level: 10 of 13 features yield large effect sizes (epsilon-squared > 0.14, Holm-corrected p < 0.001), with Parkinson's disease separable from the articulatory execution group at Cohen's d = 0.83; individual-level classification remains limited (22.6% macro F1). Second, profiles show cross-lingual profile-shape stability: cosine similarity of 5-dimensional consonant d-prime profiles exceeds 0.95 across the languages available for each aetiology. Absolute d-prime magnitudes are not cross-lingually calibrated, so the method supports language-independent phenotyping of degradation patterns but requires within-corpus calibration for absolute severity interpretation. Third, the method is architecture-independent: all 6 backbones produce monotonic severity gradients with inter-model agreement exceeding rho = 0.77. Fixed-token d-prime estimation preserves the severity correlation (rho = -0.733 at 200 tokens per class), confirming that the signal is not a token-count artefact. These results support phonological subspace analysis as a robust, training-free framework for aetiology-aware dysarthria characterisation, with evidence of cross-lingual profile-shape stability and cross-backbone robustness in the represented sample.