Multilingual Cognitive Impairment Detection in the Era of Foundation Models

arXiv cs.CL / 4/9/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The study evaluates cognitive impairment (CI) classification from speech transcripts in English, Slovene, and Korean using both zero-shot LLM classifiers and supervised tabular models under a leave-one-out setup.
  • Zero-shot LLMs serve as competitive no-training baselines, but supervised tabular approaches generally outperform them, especially when engineered linguistic features are included and fused with transcript embeddings.
  • The experiments compare three input settings—transcript-only, linguistic-features-only, and combined—and show that integrating structured linguistic signals improves robustness across languages.
  • Few-shot tests indicate that the usefulness of limited labeled data varies by language, with some languages gaining more from supervision than others without richer feature representations.
  • The authors conclude that, in small-data CI detection, structured linguistic features and fusion-based classifiers remain reliable and strong compared with purely LLM-driven approaches.

Abstract

We evaluate cognitive impairment (CI) classification from transcripts of speech in English, Slovene, and Korean. We compare zero-shot large language models (LLMs) used as direct classifiers under three input settings -- transcript-only, linguistic-features-only, and combined -- with supervised tabular approaches trained under a leave-one-out protocol. The tabular models operate on engineered linguistic features, transcript embeddings, and early or late fusion of both modalities. Across languages, zero-shot LLMs provide competitive no-training baselines, but supervised tabular models generally perform better, particularly when engineered linguistic features are included and combined with embeddings. Few-shot experiments focusing on embeddings indicate that the value of limited supervision is language-dependent, with some languages benefiting substantially from additional labelled examples while others remain constrained without richer feature representations. Overall, the results suggest that, in small-data CI detection, structured linguistic signals and simple fusion-based classifiers remain strong and reliable signals.