DLink: Distilling Layer-wise and Dominant Knowledge from EEG Foundation Models

arXiv cs.LG / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper introduces DLink, a knowledge-distillation framework designed specifically for EEG foundation models that are expensive to run on embedded BCI devices.
It argues that standard distillation struggles for EEG FMs because relevant information is spread across intermediate layers and naive dimensionality reduction can cause representational collapse and frequency/oscillation distortion.
DLink uses a dynamic Router to adaptively combine teacher layers, an EEG MiC student trained with a Mimic-then-Compress approach, and spectral distillation to align representations in the frequency domain to reduce aliasing and timing jitter.
Experiments across four EEG benchmarks show that compact “student” models can surpass lightweight baselines and get close to fully fine-tuned foundation model performance with much lower model size and inference cost.
Overall, the work provides a practical strategy for deploying EEG foundation-model capabilities in resource-constrained embedded systems while preserving oscillatory structure.

Abstract

EEG foundation models (FMs) achieve strong cross-subject and cross-task generalization but impose substantial computational and memory costs that hinder deployment on embedded BCI systems. Knowledge distillation is a natural solution; however, conventional methods fail for EEG FMs because task-relevant semantics are often distributed across intermediate layers, and aggressive dimensionality reduction can distort oscillatory structure via representational collapse and aliasing. To address these challenges, we propose DLink (Distilling Layer-wise and Dominant Knowledge), a unified framework for transferring knowledge from large EEG FMs to compact students with three key innovations: (1) a dynamic Router that adaptively aggregates teacher layers to capture dominant intermediate representations; (2) an EEG MiC student with a Mimic-then-Compress pipeline, which inherits high-dimensional teacher features and then applies structured spatio-temporal compression to avoid a heavy classification head; and (3) spectral distillation that aligns teacher-student representations in the frequency domain to regularize compression and mitigate aliasing and temporal jitter. Experiments on four EEG benchmarks show that DLink enables compact students to outperform lightweight baselines while approaching fully fine-tuned FM performance at substantially lower model size and inference cost.

FastAPI With LangChain and MongoDB

Dev.to

[Patterns] AI Agent Error Handling That Actually Works

Dev.to

Building ONNX Embedding Workflows in Oracle AI Database with Python

Dev.to

🌱 Green Habit Tracker

Dev.to

[2026] OpenTelemetry for LLM Observability — Self-Hosted Setup

Dev.to

DLink: Distilling Layer-wise and Dominant Knowledge from EEG Foundation Models

Key Points

Abstract

Related Articles

FastAPI With LangChain and MongoDB

[Patterns] AI Agent Error Handling That Actually Works

Building ONNX Embedding Workflows in Oracle AI Database with Python

🌱 Green Habit Tracker

[2026] OpenTelemetry for LLM Observability — Self-Hosted Setup

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer