Text-Attributed Knowledge Graph Enrichment with Large Language Models for Medical Concept Representation

arXiv cs.LG / 4/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces CoMed, an LLM-empowered framework that improves medical concept representation for EHR mining by enriching knowledge graphs with semantic information derived from clinical text and code relations.
It addresses missing cross-type dependencies and incomplete clinical semantics by constructing a global KG from EHR-mined associations and LLM-inferred, type-constrained relations.
CoMed further enriches the KG into a text-attributed graph by generating node descriptions and edge rationales, which provides training signals for both concepts and their interconnections.
The method jointly trains a LoRA-tuned LLaMA text encoder with a heterogeneous GNN to fuse text semantics and graph structure into unified medical concept embeddings.
Experiments on MIMIC-III and MIMIC-IV report consistent improvements in downstream prediction and demonstrate that CoMed can function as a plug-in concept encoder in standard EHR pipelines.

Abstract

In electronic health record (EHR) mining, learning high-quality representations of medical concepts (e.g., standardized diagnosis, medication, and procedure codes) is fundamental for downstream clinical prediction. However, robust concept representation learning is hindered by two key challenges: (i) clinically important cross-type dependencies (e.g., diagnosis-medication and medication-procedure relations) are often missing or incomplete in existing ontology resources, limiting the ability to model complex EHR patterns; and (ii) rich clinical semantics are often missing from structured resources, and even when available as text, are difficult to integrate with KG structure for representation learning. To address these challenges, we present CoMed, an LLM-empowered graph learning framework for medical concept representation. CoMed first builds a global knowledge graph (KG) over medical codes by combining statistically reliable associations mined from EHRs with type-constrained LLM prompting to infer semantic relations. It then utilizes LLMs to enrich the KG into a text-attributed graph by generating node descriptions and edge rationales, providing semantic signals for both concepts and their relationships. Finally, CoMed jointly trains a LoRA-tuned LLaMA text encoder with a heterogeneous GNN, fusing text semantics and graph structure into unified concept embeddings. Extensive experiments on MIMIC-III and MIMIC-IV show that CoMed consistently improves prediction performance and serves as an effective plug-in concept encoder for standard EHR pipelines.

"The AI Agent's Guide to Sustainable Income: From Zero to Profitability"

Dev.to

"The Hidden Economics of AI Agents: Survival Strategies in Competitive Markets"

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

"The Hidden Costs of AI Agent Deployment: A CFO's Guide to True ROI in Enterpris

Dev.to

"The Real Cost of AI Compute: Why Token Efficiency Separates Viable Agents from

Dev.to

Text-Attributed Knowledge Graph Enrichment with Large Language Models for Medical Concept Representation

Key Points

Abstract

Related Articles

"The AI Agent's Guide to Sustainable Income: From Zero to Profitability"

"The Hidden Economics of AI Agents: Survival Strategies in Competitive Markets"

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

"The Hidden Costs of AI Agent Deployment: A CFO's Guide to True ROI in Enterpris

"The Real Cost of AI Compute: Why Token Efficiency Separates Viable Agents from

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer