MoDAl: Self-Supervised Neural Modality Discovery via Decorrelation for Speech Neuroprosthesis

arXiv cs.CL / 5/4/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces MoDAl, a self-supervised framework for discovering diverse neural modalities to improve speech neuroprosthesis decoding when audible speech is not present.
  • MoDAl jointly uses (1) a contrastive alignment loss that maps multiple brain encoders into a shared space aligned with pretrained LLM text embeddings and (2) a decorrelation loss that discourages redundant/coalesced representations.
  • The authors show the two objectives are in “productive tension,” where alignment promotes modality sharing but decorrelation is necessary to counteract representational collapse and enable coverage of complementary signals.
  • On the Brain-to-Text Benchmark ’24, MoDAl improves word error rate from 26.3% to 21.6% versus the previous best end-to-end approach, with the benefit traced specifically to incorporating signals from area 44.
  • Analysis indicates functional specialization: encoders using area 44 capture structural and syntactic features such as grammatical voice, wh-words, and sentence length, aligning with known roles of Broca’s area.

Abstract

Speech neuroprosthesis systems decode intended speech from neural activity in the absence of audible output, offering a path to restoring communication for individuals with speech-impairing conditions. Current approaches decode predominantly from motor cortical areas, discarding others -- such as area 44, part of Broca's area -- that may encode complementary linguistic information. We introduce MoDAl (Modality Decorrelation and Alignment), a framework that discovers complementary neural modalities through the interplay of two objectives in a shared projection space. A contrastive loss aligns each of several parallel brain encoders with the text embeddings of a pretrained large language model (LLM), while a decorrelation loss prevents the encoders from coalescing to duplicative representations. We prove that these objectives are in productive tension: Contrastive alignment induces transitive modality coalescence, which decorrelation must counteract for the framework to discover diverse neurolinguistic modalities. On the Brain-to-Text Benchmark '24, MoDAl reduces word error rate (WER) from 26.3% to 21.6% compared to the previous best end-to-end method, with the gain from incorporating previously discarded area 44 signals arising entirely from the decorrelation mechanism. Analysis of the discovered modalities reveals functional specialization: Encoders receiving area 44 input capture structural and syntactic properties (sentence length, grammatical voice, wh-words), consistent with the neurolinguistic understanding of Broca's area.