Seeing the imagined: a latent functional alignment in visual imagery decoding from fMRI data

arXiv cs.AI / 4/20/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper examines whether a state-of-the-art fMRI-to-visual decoder trained for perception can be adapted to reconstruct mental imagery from the Imagery-NSD benchmark.
  • It proposes a “latent functional alignment” method that maps imagery-evoked brain activity into the pretrained model’s conditioning space while freezing the rest of the network.
  • To address limited paired imagery-perception supervision, the authors add a retrieval-based augmentation that picks semantically related perception trials from NSD.
  • Experiments across four subjects show consistent gains in high-level semantic reconstruction versus both a frozen pretrained baseline and a voxel-space ridge alignment baseline.
  • The results indicate that semantic structure learned from perception can improve and stabilize visual imagery decoding under out-of-distribution conditions, achieving above-chance decoding across multiple cortical regions.

Abstract

Recent progress in visual brain decoding from fMRI has been enabled by large-scale datasets such as the Natural Scenes Dataset (NSD) and powerful diffusion-based generative models. While current pipelines are primarily optimized for perception, their performance under mental-imagery remains less well understood. In this work, we study how a state-of-the-art (SOTA) perception decoder (DynaDiff) can be adapted to reconstruct imagined content from the Imagery-NSD benchmark. We propose a latent functional alignment approach that maps imagery-evoked activity into the pretrained model's conditioning space, while keeping the remaining components frozen. To mitigate the limited amount of matched imagery-perception supervision, we further introduce a retrieval-based augmentation strategy that selects semantically related NSD perception trials. Across four subjects, latent functional alignment consistently improves high-level semantic reconstruction metrics relative to the frozen pretrained baseline and a voxel-space ridge alignment baseline, and enables above-chance decoding from multiple cortical regions. These results suggest that semantic structure learned from perception can be leveraged to stabilize and improve visual imagery decoding under out-of-distribution conditions.