AI Navigate

CAMEL-CLIP: Channel-aware Multimodal Electroencephalography-text Alignment for Generalizable Brain Foundation Models

arXiv cs.LG / 3/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • CAMEL-CLIP introduces a channel-aware multimodal EEG-text alignment model designed to be robust to heterogeneous EEG channel configurations.
  • The model employs three key components: channel attribute-based positional encoding, dynamic channel projection, and dual-level contrastive learning (channel-level and sample-level).
  • Experimental results show state-of-the-art performance under linear probing and better performance than existing foundation models that rely on full finetuning.
  • The approach aims to enable more generalizable brain foundation models across diverse downstream EEG tasks and channel setups.

Abstract

Electroencephalography (EEG) foundation models have shown promise for learning generalizable representations, yet they remain sensitive to channel heterogeneity, such as changes in channel composition or ordering. We propose channel-aware multimodal EEG-text alignment contrastive language-image pretraining (CAMEL-CLIP), a contrastive EEG-text multimodal foundation model designed to be robust to heterogeneous channel configurations and widely applicable to diverse downstream tasks. CAMEL-CLIP introduces three key components: (1) channel attribute-based positional encoding, which identifies channels through semantic information; (2) dynamic channel projection, which generates variable-length embeddings by independently projecting each channel without feature compression; and (3) dual-level contrastive learning, which jointly performs channel-level and sample-level contrastive learning to capture both channel-specific and global signal characteristics. Experimental results demonstrate that CAMEL-CLIP achieves state-of-the-art performance under linear-probing and outperforms existing foundation models that rely on full-finetuning.