LI-DSN: A Layer-wise Interactive Dual-Stream Network for EEG Decoding

arXiv cs.LG / 4/3/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces LI-DSN, a layer-wise interactive dual-stream neural network designed for EEG decoding that addresses the “information silo” issue of late-fusion dual-stream architectures.
LI-DSN adds progressive cross-stream communication at each layer using a Temporal-Spatial Integration Attention mechanism with SACM (spatial inter-electrode structural relationships) and TCAM (spatial-guided temporal aggregation with cosine-gated dynamics).
It also proposes an adaptive fusion strategy with learnable channel weights to optimize how temporal and spatial stream features are integrated.
Experiments on eight diverse EEG datasets covering motor imagery classification, emotion recognition, and SSVEP show LI-DSN significantly outperforms 13 state-of-the-art baseline models in robustness and decoding performance.
The authors note that the code will be made public after acceptance.

Abstract

Electroencephalography (EEG) provides a non-invasive window into brain activity, offering high temporal resolution crucial for understanding and interacting with neural processes through brain-computer interfaces (BCIs). Current dual-stream neural networks for EEG often process temporal and spatial features independently through parallel branches, delaying their integration until a final, late-stage fusion. This design inherently leads to an "information silo" problem, precluding intermediate cross-stream refinement and hindering spatial-temporal decompositions essential for full feature utilization. We propose LI-DSN, a layer-wise interactive dual-stream network that facilitates progressive, cross-stream communication at each layer, thereby overcoming the limitations of late-fusion paradigms. LI-DSN introduces a novel Temporal-Spatial Integration Attention (TSIA) mechanism, which constructs a Spatial Affinity Correlation Matrix (SACM) to capture inter-electrode spatial structural relationships and a Temporal Channel Aggregation Matrix (TCAM) to integrate cosine-gated temporal dynamics under spatial guidance. Furthermore, we employ an adaptive fusion strategy with learnable channel weights to optimize the integration of dual-stream features. Extensive experiments across eight diverse EEG datasets, encompassing motor imagery (MI) classification, emotion recognition, and steady-state visual evoked potentials (SSVEP), consistently demonstrate that LI-DSN significantly outperforms 13 state-of-the-art (SOTA) baseline models, showcasing its superior robustness and decoding performance. The code will be publicized after acceptance.