Guideline2Graph: Profile-Aware Multimodal Parsing for Executable Clinical Decision Graphs

arXiv cs.LG / 4/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • Guideline2Graph proposes a decomposition-first pipeline to convert multimodal, branching clinical practice guidelines into executable clinical decision support (CDS) decision graphs with preserved cross-page continuity.
  • The method uses topology-aware chunking, interface-constrained chunk graph generation with explicit entry/terminal interfaces, and provenance-preserving global aggregation to keep induced control flow auditable and consistent.
  • Unlike one-shot or mostly local/text-centric LLM/VLM extraction approaches, it applies semantic deduplication and global consolidation to better represent end-to-end guideline control flow.
  • Evaluated on an adjudicated prostate-guideline benchmark using matched inputs and the same underlying VLM backbone, the approach substantially improves graph quality metrics (e.g., edge/triplet precision/recall and node recall) versus existing methods.
  • The authors conclude the approach is promising but note evidence is currently limited to a single adjudicated prostate guideline, motivating broader multi-guideline validation.

Abstract

Clinical practice guidelines are long, multimodal documents whose branching recommendations are difficult to convert into executable clinical decision support (CDS), and one-shot parsing often breaks cross-page continuity. Recent LLM/VLM extractors are mostly local or text-centric, under-specifying section interfaces and failing to consolidate cross-page control flow across full documents into one coherent decision graph. We present a decomposition-first pipeline that converts full-guideline evidence into an executable clinical decision graph through topology-aware chunking, interface-constrained chunk graph generation, and provenance-preserving global aggregation. Rather than relying on single-pass generation, the pipeline uses explicit entry/terminal interfaces and semantic deduplication to preserve cross-page continuity while keeping the induced control flow auditable and structurally consistent. We evaluate on an adjudicated prostate-guideline benchmark with matched inputs and the same underlying VLM backbone across compared methods. On the complete merged graph, our approach improves edge and triplet precision/recall from 19.6\%/16.1\% in existing models to 69.0\%/87.5\%, while node recall rises from 78.1\% to 93.8\%. These results support decomposition-first, auditable guideline-to-CDS conversion on this benchmark, while current evidence remains limited to one adjudicated prostate guideline and motivates broader multi-guideline validation.