EEG2Vision: A Multimodal EEG-Based Framework for 2D Visual Reconstruction in Cognitive Neuroscience

arXiv cs.CV / 4/10/2026

💬 OpinionSignals & Early TrendsModels & Research

Key Points

  • The paper introduces EEG2Vision, a modular end-to-end framework that reconstructs 2D images from non-invasive EEG under realistic, low-density electrode setups.
  • EEG-to-image reconstruction is built on an EEG-conditioned diffusion approach, and a prompt-guided post-reconstruction boosting stage is added to refine geometry and perceptual coherence.
  • The boosting mechanism uses a multimodal large language model to extract semantic descriptions and then applies image-to-image diffusion to improve visual quality while keeping EEG-grounded structure.
  • Results show that reducing EEG channels sharply hurts semantic decoding accuracy (e.g., 50-way Top-1 accuracy drops from 89% to 38%), while perceptual reconstruction quality degrades only slightly (e.g., FID from 76.77 to 80.51).
  • The boosting stage delivers consistent perceptual improvements, including up to 9.71% IS gains in low-channel settings, and a user study indicates participants prefer boosted reconstructions, supporting feasibility for more real-time outside-lab brain-to-image applications.

Abstract

Reconstructing visual stimuli from non-invasive electroencephalography (EEG) remains challenging due to its low spatial resolution and high noise, particularly under realistic low-density electrode configurations. To address this, we present EEG2Vision, a modular, end-to-end EEG-to-image framework that systematically evaluates reconstruction performance across different EEG resolutions (128, 64, 32, and 24 channels) and enhances visual quality through a prompt-guided post-reconstruction boosting mechanism. Starting from EEG-conditioned diffusion reconstruction, the boosting stage uses a multimodal large language model to extract semantic descriptions and leverages image-to-image diffusion to refine geometry and perceptual coherence while preserving EEG-grounded structure. Our experiments show that semantic decoding accuracy degrades significantly with channel reduction (e.g., 50-way Top-1 Acc from 89% to 38%), while reconstruction quality slight decreases (e.g., FID from 76.77 to 80.51). The proposed boosting consistently improves perceptual metrics across all configurations, achieving up to 9.71% IS gains in low-channel settings. A user study confirms the clear perceptual preference for boosted reconstructions. The proposed approach significantly boosts the feasibility of real-time brain-2-image applications using low-resolution EEG devices, potentially unlocking this type of applications outside laboratory settings.