Test-Time Adaptation for EEG Foundation Models: A Systematic Study under Real-World Distribution Shifts

arXiv cs.LG / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper investigates how EEG foundation models cope with real-world distribution shifts across clinical settings, devices, and populations, where clinical deployment is challenged by unlabeled target data and limited labels.
  • It introduces NeuroAdapt-Bench, a systematic benchmark to evaluate test-time adaptation (TTA) methods for EEG under realistic distribution changes, including in-distribution, out-of-distribution, and extreme modality shifts such as Ear-EEG.
  • Across multiple pretrained EEG foundation models and downstream tasks, standard TTA methods show inconsistent improvements and can even degrade performance during inference.
  • Gradient-based TTA approaches are found to be especially prone to severe degradation, while optimization-free methods are more stable and deliver more reliable gains.
  • The authors conclude that existing general TTA techniques have significant limitations for EEG and recommend domain-specific adaptation strategies going forward.

Abstract

Electroencephalography (EEG) foundation models have shown strong potential for learning generalizable representations from large-scale neural data, yet their clinical deployment is hindered by distribution shifts across clinical settings, devices, and populations. Test-time adaptation (TTA) offers a promising solution by enabling models to adapt to unlabeled target data during inference without access to source data, a valuable property in healthcare settings constrained by privacy regulations and limited labeled data. However, its effectiveness for EEG remains largely underexplored. In this work, we introduce NeuroAdapt-Bench, a systematic benchmark for evaluating test-time adaptation methods on EEG foundation models under realistic distribution shifts. We evaluate representative TTA approaches from other domains across multiple pretrained foundation models, diverse downstream tasks, and heterogeneous datasets spanning in-distribution, out-of-distribution, and extreme modality shifts (e.g., Ear-EEG). Our results show that standard TTA methods yield inconsistent gains and often degrade performance, with gradient-based approaches particularly prone to heavy degradation. In contrast, optimization-free methods demonstrate greater stability and more reliable improvements. These findings highlight the limitations of existing TTA techniques in EEG, provide guidance for future development, and underscore the need for domain-specific adaptation strategies.