CIPHER: Conformer-based Inference of Phonemes from High-density EEG

arXiv cs.AI / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

研究では、高密度スキャルプEEGから音声（特に音素）情報を推定する課題に対し、ERP特徴とブロードバンドDDA係数の2経路を用いたConformerベースのモデルCIPHERを提案しています。
OpenNeuroのds006104（24名、2つのTMS併用研究）での二値の調音タスクは高精度（ほぼ天井）ですが、音響オンセットの分離やTMSターゲットのブロッキングなどの交絡に非常に影響されやすいことが示されています。
主要な11クラスのCVC音素タスクでは、Study 2のLOSO（16名ホールドアウト）において音素識別のきめ細かさが限定的で、実単語WERがERPで0.671±0.080、DDAで0.688±0.096と大きめでした。
著者らは本研究を「EEGからテキスト化するシステム」ではなく、ベンチマークと特徴（表現）の比較の位置づけとし、交絡を制御した証拠に限定して神経表現の主張を行うとしています。

Abstract

Decoding speech information from scalp EEG remains difficult due to low SNR and spatial blurring. We present CIPHER (Conformer-based Inference of Phonemes from High-density EEG Representations), a dual-pathway model using (i) ERP features and (ii) broadband DDA coefficients. On OpenNeuro ds006104 (24 participants, two studies with concurrent TMS), binary articulatory tasks reach near-ceiling performance but are highly confound-vulnerable (acoustic onset separability and TMS-target blocking). On the primary 11-class CVC phoneme task under full Study 2 LOSO (16 held-out subjects), performance is substantially lower (real-word WER: ERP 0.671 +/- 0.080, DDA 0.688 +/- 0.096, indicating limited fine-grained discriminability. We therefore position this work as a benchmark and feature-comparison study rather than an EEG-to-text system, and we constrain neural-representation claims to confound-controlled evidence.