Structure-Guided Diffusion Model for EEG-Based Visual Cognition Reconstruction

arXiv cs.CV / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a Structure-Guided Diffusion Model (SGDM) to reconstruct visual cognition from EEG, aiming to move beyond prior approaches limited to natural-image constraints and categorical outputs.
  • SGDM uses a two-stage generative pipeline that combines a structurally supervised variational autoencoder, a spatiotemporal EEG encoder aligned to a visual embedding space via contrastive learning, and a diffusion model guided by ControlNet.
  • Experiments on both the Kilogram abstract visual object dataset and the THINGS natural image dataset show that SGDM outperforms existing methods, improving both low-level visual fidelity and semantic reconstruction quality.
  • Spatiotemporal EEG analyses suggest hierarchical structural encoding consistent with visual cognitive dynamics, supporting the model’s ability to capture explicit structural geometry.
  • The work positions SGDM as a way to increase the degrees of freedom in BCI intention decoding by enabling more flexible brain-to-machine communication from complex visual content.

Abstract

Objective: Decoding visual information from electroencephalography (EEG) is an important problem in neuroscience and brain-computer interface (BCI) research. Existing methods are largely restricted to natural images and categorical representations, with limited capacity to capture structural features and to differentiate objective perception from subjective cognition. We propose a Structure-Guided Diffusion Model (SGDM) that incorporates explicit structural information for EEG-based visual reconstruction. Approach: SGDM is evaluated on the Kilogram abstract visual object dataset and the THINGS natural image dataset using a two-stage generative mechanism. The framework combines a structurally supervised variational autoencoder with a spatiotemporal EEG encoder aligned to a visual embedding space via contrastive learning. Structural information is integrated into a diffusion model through ControlNet to guide image generation from EEG features. Results: SGDM outperforms existing methods on both abstract and natural image datasets. Reconstructed images achieve higher fidelity in low-level visual features and semantic representations, indicating improved decoding accuracy and strong generalization across diverse visual domains. Spatiotemporal analysis of EEG signals further reveals hierarchical structural encoding patterns, consistent with the neural dynamics of visual cognition. Significance: These findings validate the effectiveness of SGDM in capturing explicit structural geometry and generating images with high fidelity to individual cognitive representations. By enabling decoding of complex visual content from EEG signals, the framework extends neural decoding beyond low-dimensional or categorical outputs. This supports BCIs with increased degrees of freedom for intention decoding and more flexible brain-to-machine communication.