MedLayBench-V: A Large-Scale Benchmark for Expert-Lay Semantic Alignment in Medical Vision Language Models

arXiv cs.CL / 4/8/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • 医療用Vision-Languageモデル(Med-VLM)が診断画像の理解で高性能でも、患者向けの「一般向け(lay register)」表現への対応が不足している点を問題提起しています。
  • そのギャップに対し、専門家と一般の意味を揃える“expert-lay semantic alignment”に特化した大規模マルチモーダルベンチマークMedLayBench-Vを提案します。
  • たんに平易化するだけの手法では幻覚(hallucination)リスクがあるため、Structured Concept-Grounded Refinement(SCGR)パイプラインで厳密な意味同等性を担保する方針です。
  • SCGRではUMLSのConcept Unique Identifiers(CUIs)とミクロなエンティティ制約を統合し、意味の対応を検証可能な形で設計しています。
  • MedLayBench-Vは、次世代Med-VLMが臨床専門家と患者のコミュニケーションの橋渡しをするための学習・評価の基盤になることを目指しています。

Abstract

Medical Vision-Language Models (Med-VLMs) have achieved expert-level proficiency in interpreting diagnostic imaging. However, current models are predominantly trained on professional literature, limiting their ability to communicate findings in the lay register required for patient-centered care. While text-centric research has actively developed resources for simplifying medical jargon, there is a critical absence of large-scale multimodal benchmarks designed to facilitate lay-accessible medical image understanding. To bridge this resource gap, we introduce MedLayBench-V, the first large-scale multimodal benchmark dedicated to expert-lay semantic alignment. Unlike naive simplification approaches that risk hallucination, our dataset is constructed via a Structured Concept-Grounded Refinement (SCGR) pipeline. This method enforces strict semantic equivalence by integrating Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs) with micro-level entity constraints. MedLayBench-V provides a verified foundation for training and evaluating next-generation Med-VLMs capable of bridging the communication divide between clinical experts and patients.

MedLayBench-V: A Large-Scale Benchmark for Expert-Lay Semantic Alignment in Medical Vision Language Models | AI Navigate