Human Knowledge Integrated Multi-modal Learning for Single Source Domain Generalization
arXiv cs.CV / 3/16/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces domain conformal bounds (DCB) to assess whether domains diverge in unknown causal factors, enabling objective evaluation of cross-domain generalization without access to data metadata.
- It proposes GenEval, a multimodal vision-language model approach that combines foundational models (e.g., MedGemma-4B) with human knowledge via Low-Rank Adaptation (LoRA) to bridge causal gaps and improve single-source domain generalization.
- GenEval is evaluated on eight diabetic retinopathy datasets and two resting-state fMRI seizure onset zone datasets, achieving average accuracies of 69.2% for DR and 81% for SOZ, outperforming baselines by 9.4% and 1.8%, respectively.
- The work frames a generalizable framework for assessing domain shifts and enhancing SDG in medical imaging with multimodal learning, potentially applicable beyond the tested modalities.




