LI-DSN: A Layer-wise Interactive Dual-Stream Network for EEG Decoding

arXiv cs.LG / 4/3/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces LI-DSN, a layer-wise interactive dual-stream neural network designed for EEG decoding that addresses the “information silo” issue of late-fusion dual-stream architectures.
  • LI-DSN adds progressive cross-stream communication at each layer using a Temporal-Spatial Integration Attention mechanism with SACM (spatial inter-electrode structural relationships) and TCAM (spatial-guided temporal aggregation with cosine-gated dynamics).
  • It also proposes an adaptive fusion strategy with learnable channel weights to optimize how temporal and spatial stream features are integrated.
  • Experiments on eight diverse EEG datasets covering motor imagery classification, emotion recognition, and SSVEP show LI-DSN significantly outperforms 13 state-of-the-art baseline models in robustness and decoding performance.
  • The authors note that the code will be made public after acceptance.

Abstract

Electroencephalography (EEG) provides a non-invasive window into brain activity, offering high temporal resolution crucial for understanding and interacting with neural processes through brain-computer interfaces (BCIs). Current dual-stream neural networks for EEG often process temporal and spatial features independently through parallel branches, delaying their integration until a final, late-stage fusion. This design inherently leads to an "information silo" problem, precluding intermediate cross-stream refinement and hindering spatial-temporal decompositions essential for full feature utilization. We propose LI-DSN, a layer-wise interactive dual-stream network that facilitates progressive, cross-stream communication at each layer, thereby overcoming the limitations of late-fusion paradigms. LI-DSN introduces a novel Temporal-Spatial Integration Attention (TSIA) mechanism, which constructs a Spatial Affinity Correlation Matrix (SACM) to capture inter-electrode spatial structural relationships and a Temporal Channel Aggregation Matrix (TCAM) to integrate cosine-gated temporal dynamics under spatial guidance. Furthermore, we employ an adaptive fusion strategy with learnable channel weights to optimize the integration of dual-stream features. Extensive experiments across eight diverse EEG datasets, encompassing motor imagery (MI) classification, emotion recognition, and steady-state visual evoked potentials (SSVEP), consistently demonstrate that LI-DSN significantly outperforms 13 state-of-the-art (SOTA) baseline models, showcasing its superior robustness and decoding performance. The code will be publicized after acceptance.