NeuroFlow: Toward Unified Visual Encoding and Decoding from Neural Activity

arXiv cs.LG / 4/14/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes NeuroFlow, a unified framework that models both visual encoding (stimulus → neural activity) and decoding (neural activity → stimulus) in a single reversible flow model rather than using separate pipelines.
  • NeuroFlow combines a variational backbone (NeuroVAE) that learns a compact, semantically structured latent space to support bidirectional modeling across visual and neural modalities.
  • It introduces Cross-modal Flow Matching (XFM), which learns a reversibly consistent mapping between visual and neural latent distributions, improving consistency without relying on modality-specific diffusion-style noise-to-data conditioning.
  • Experiments show NeuroFlow delivers better overall performance on both encoding and decoding tasks while maintaining higher computational efficiency than approaches that treat the tasks independently.
  • The authors further analyze what drives encoding–decoding consistency and report brain functional analysis indicating the model captures activation patterns that reflect neural variability, aiming to inform future bidirectional visual brain-computer interfaces.

Abstract

Visual encoding and decoding models act as gateways to understanding the neural mechanisms underlying human visual perception. Typically, visual encoding models that predict brain activity from stimuli and decoding models that reproduce stimuli from brain activity are treated as distinct tasks, requiring separate models and training procedures. This separation is inefficient and fails to model the consistency between encoding and decoding processes. To address this limitation, we propose NeuroFlow, the first unified framework that jointly models visual encoding and decoding from neural activity within a single flow model. NeuroFlow introduces two key components: (1) NeuroVAE is designed as a variational backbone to model neural variability and establish a compact, semantically structured latent space for bidirectional modeling across visual and neural modalities. (2) Cross-modal Flow Matching (XFM) bypasses the typical paradigm of noise-to-data diffusion guided by a specific modality condition, instead learning a reversibly consistent flow model between visual and neural latent distributions. For the first time, visual encoding and decoding are reformulated as a time-dependent, reversible process within a shared latent space for unified modeling. Empirical results demonstrate that NeuroFlow achieves superior overall performance in visual encoding and decoding tasks with higher computational efficiency compared to any isolated methods. We further analyze principal factors that steer the model toward encoding-decoding consistency and, through brain functional analyses, demonstrate that NeuroFlow captures consistent activation patterns underlying neural variability. NeuroFlow marks a major step toward unified visual encoding and decoding from neural activity, providing mechanistic insights that inform future bidirectional visual brain-computer interfaces.