Schema-Adaptive Tabular Representation Learning with LLMs for Generalizable Multimodal Clinical Reasoning

arXiv cs.AI / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Schema-Adaptive Tabular Representation Learning, an LLM-based method to improve schema generalization for tabular data by converting structured fields into semantic natural-language statements.
  • It enables zero-shot alignment of tabular embeddings across unseen EHR schemas without manual feature engineering or additional retraining.
  • The authors integrate the learned tabular encoder into a multimodal dementia diagnosis system by combining tabular EHR features with MRI data.
  • Experiments on NACC and ADNI show state-of-the-art results, including successful zero-shot transfer to new clinical schemas that outperform clinical baselines in retrospective tasks.
  • The work argues that LLM-driven semantic tabular encoding provides a scalable and robust pathway for applying LLM reasoning to heterogeneous structured healthcare data.

Abstract

Machine learning for tabular data remains constrained by poor schema generalization, a challenge rooted in the lack of semantic understanding of structured variables. This challenge is particularly acute in domains like clinical medicine, where electronic health record (EHR) schemas vary significantly. To solve this problem, we propose Schema-Adaptive Tabular Representation Learning, a novel method that leverages large language models (LLMs) to create transferable tabular embeddings. By transforming structured variables into semantic natural language statements and encoding them with a pretrained LLM, our approach enables zero-shot alignment across unseen schemas without manual feature engineering or retraining. We integrate our encoder into a multimodal framework for dementia diagnosis, combining tabular and MRI data. Experiments on NACC and ADNI datasets demonstrate state-of-the-art performance and successful zero-shot transfer to unseen schemas, significantly outperforming clinical baselines, including board-certified neurologists, in retrospective diagnostic tasks. These results validate our LLM-driven approach as a scalable, robust solution for heterogeneous real-world data, offering a pathway to extend LLM-based reasoning to structured domains.