AI Navigate

Paper Title: LoV3D: Grounding Cognitive Prognosis Reasoning in Longitudinal 3D Brain MRI via Regional Volume Assessments

arXiv cs.CV / 3/13/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • LoV3D introduces a pipeline for training 3D vision-language models to read longitudinal T1-weighted brain MRI and output region-level anatomical assessments with longitudinal comparisons, ultimately providing a three-class cognitive diagnosis (Cognitively Normal, Mild Cognitive Impairment, or Dementia) and a synthesized diagnostic summary.
  • The approach grounds the final diagnosis by enforcing label consistency, longitudinal coherence, and biological plausibility to reduce the risk of hallucinations.
  • It trains a clinically-weighted Verifier that scores candidate outputs against normative references from standardized volume metrics, enabling Direct Preference Optimization without any human annotation.
  • On a subject-level held-out ADNI test set, LoV3D achieves 93.7% three-class diagnostic accuracy (a +34.8% improvement over a no-grounding baseline), 97.2% two-class accuracy, and strong zero-shot transfer performance across MIRIAD and AIBL, with code available at the provided GitHub repo.

Abstract

Longitudinal brain MRI is essential for characterizing the progression of neurological diseases such as Alzheimer's disease assessment. However, current deep-learning tools fragment this process: classifiers reduce a scan to a label, volumetric pipelines produce uninterpreted measurements, and vision-language models (VLMs) may generate fluent but potentially hallucinated conclusions. We present LoV3D, a pipeline for training 3D vision-language models, which reads longitudinal T1-weighted brain MRI, produces a region-level anatomical assessment, conducts longitudinal comparison with the prior scan, and finally outputs a three-class diagnosis (Cognitively Normal, Mild Cognitive Impairment, or Dementia) along with a synthesized diagnostic summary. The stepped pipeline grounds the final diagnosis by enforcing label consistency, longitudinal coherence, and biological plausibility, thereby reducing the risks of hallucinations. The training process introduces a clinically-weighted Verifier that scores candidate outputs automatically against normative references derived from standardized volume metrics, driving Direct Preference Optimization without a single human annotation. On a subject-level held-out ADNI test set (479 scans, 258 subjects), LoV3D achieves 93.7% three-class diagnostic accuracy (+34.8% over the no-grounding baseline), 97.2% on two-class diagnosis accuracy (+4% over the SOTA) and 82.6% region-level anatomical classification accuracy (+33.1% over VLM baselines). Zero-shot transfer yields 95.4% on MIRIAD (100% Dementia recall) and 82.9% three-class accuracy on AIBL, confirming high generalizability across sites, scanners, and populations. Code is available at https://github.com/Anonymous-TEVC/LoV-3D.