Multimodal LLMs are not all you need for Pediatric Speech Language Pathology

arXiv cs.CL / 4/30/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies how to classify Pediatric Speech Sound Disorders (SSD) more effectively, addressing the real-world challenge of limited clinician staffing and overwhelming caseloads.
  • It proposes a hierarchical, cascading classification pipeline that moves from binary classification to disorder type and then to symptom classification using the SLPHelmUltraSuitePlus benchmark.
  • By fine-tuning Speech Representation Models (SRM) and applying targeted data augmentation, the authors mitigate biases seen in prior work and improve performance across all benchmark clinical tasks.
  • The study also extends the same data augmentation approach to Automatic Speech Recognition (ASR), further evaluating the method beyond diagnosis/classification.
  • Across evaluated tasks, SRM-based approaches outperform the current LLM-based state of the art by a substantial margin, and the authors release models and code to support follow-on research.

Abstract

Speech Sound Disorders (SSD) affect roughly five percent of children, yet speech-language pathologists face severe staffing shortages and unmanageable caseloads. We test a hierarchical approach to SSD classification on the granular multi-task SLPHelmUltraSuitePlus benchmark. We propose a cascading approach from binary classification to type, and symptom classification. By fine-tuning Speech Representation Models (SRM), and using targeted data augmentation we mitigate biases found by previous works, and improve upon all clinical tasks in the benchmark. We also treat Automatic Speech Recognition (ASR) with our data augmentation approach. Our results demonstrate that SRM consistently outperform the LLM-based state-of-the-art across all evaluated tasks by a large margin. We publish our models and code to foster future research.