AI Navigate

ALIGN: Adversarial Learning for Generalizable Speech Neuroprosthesis

arXiv cs.LG / 3/20/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • ALIGN is a session-invariant learning framework that enables cross-session generalization for intracortical speech BCIs using multi-domain adversarial neural networks.
  • It jointly trains a feature encoder, a phoneme classifier, and a domain classifier, using adversarial optimization to preserve task-relevant information while suppressing session-specific cues.
  • The approach is semi-supervised, leveraging data from multiple sessions to adapt to unseen sessions without requiring labeled data.
  • Empirical results show ALIGN improves phoneme error rate and word error rate on previously unseen sessions compared to baselines, indicating robust longitudinal BCI decoding.

Abstract

Intracortical brain-computer interfaces (BCIs) can decode speech from neural activity with high accuracy when trained on data pooled across recording sessions. In realistic deployment, however, models must generalize to new sessions without labeled data, and performance often degrades due to cross-session nonstationarities (e.g., electrode shifts, neural turnover, and changes in user strategy). In this paper, we propose ALIGN, a session-invariant learning framework based on multi-domain adversarial neural networks for semi-supervised cross-session adaptation. ALIGN trains a feature encoder jointly with a phoneme classifier and a domain classifier operating on the latent representation. Through adversarial optimization, the encoder is encouraged to preserve task-relevant information while suppressing session-specific cues. We evaluate ALIGN on intracortical speech decoding and find that it generalizes consistently better to previously unseen sessions, improving both phoneme error rate and word error rate relative to baselines. These results indicate that adversarial domain alignment is an effective approach for mitigating session-level distribution shift and enabling robust longitudinal BCI decoding.