[Oldie-But-A-Goodie] META Presents "TRIBE v2": A Next-Gen Model That Acts As A Digital Twin Of Human Neural Activity

Reddit r/LocalLLaMA / 4/10/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • Meta’s TRIBE v2 is presented as a tri-modal foundation model that predicts human brain activity from video, audio, and language inputs.
  • The approach is trained on a unified dataset of 1,000+ hours of fMRI from 720 subjects, aiming to generalize to novel stimuli, tasks, and previously unseen individuals.
  • Reported results indicate several-fold improvements over traditional linear encoding models, with more accurate high-resolution brain response predictions on new subjects.
  • TRIBE v2 is positioned as enabling “in silico” neuroscience, including reproducing findings from classic visual and neuro-linguistic paradigms without additional scanning.
  • By extracting interpretable latent features, TRIBE v2 is said to reveal fine-grained topography of multisensory integration across brain regions.
[Oldie-But-A-Goodie] META Presents "TRIBE v2": A Next-Gen Model That Acts As A Digital Twin Of Human Neural Activity

TL;DR:

META's New AI Can Predict Your Brain Better Than A Brain Scan.


Abstract:

Cognitive neuroscience is fragmented into specialized models, each tailored to specific experimental paradigms, hence preventing a unified model of cognition in the human brain. Here, we introduce TRIBE v2, a tri-modal (video, audio and language) foundation model capable of predicting human brain activity in a variety of naturalistic and experimental conditions.

Leveraging a unified dataset of over 1,000 hours of fMRI across 720 subjects, we demonstrate that our model accurately predicts high-resolution brain responses for novel stimuli, tasks and subjects, superseding traditional linear encoding models, delivering several-fold improvements in accuracy. >

Critically, TRIBE v2 enables in silico experimentation: tested on seminal visual and neuro-linguistic paradigms, it recovers a variety of results established by decades of empirical research. Finally, by extracting interpretable latent features, TRIBE v2 reveals the fine-grained topography of multisensory integration.

These results establish artificial intelligence as a unifying framework for exploring the functional organization of the human brain.


Layman's Explanation:

TRIBE v2 is a foundation model trained on 1,000+ hours of brain imaging data from 720 people. You feed it a video, sound clip, or text, and it predicts:

  • Which brain regions light up

  • How strongly

  • And in what order

When tested on people it had never seen, the model's predictions were actually more accurate than most real brain scans (which get distorted by heartbeats, breathing, and movement). Researchers then replicated decades of classic neuroscience experiments entirely inside the software.

No scanner, no human subjects.

The model correctly identified the brain's face recognition center, language network, and emotional processing regions on its own.

My Thoughts:

Look at what else Meta has been building:

  • Ray-Ban smart glasses that see and hear what you do

  • A wristband that reads nerve signals

  • And now a model that predicts how your brain responds to any piece of content

There's no evidence these are all connected, however regardless Meta now has a complete picture of attention, from the stimulus to the neural response.


Link to the Paper: https://ai.meta.com/research/publications/a-foundation-model-of-vision-audition-and-language-for-in-silico-neuroscience/

Link to the GitHub: https://github.com/facebookresearch/tribev2

Link to the Open-Sourced Weights: https://huggingface.co/facebook/tribev2
submitted by /u/44th--Hokage
[link] [comments]