The Provenance Gap in Clinical AI: Evidence-Traceable Temporal Knowledge Graphs for Rare Disease Reasoning
arXiv cs.CL / 4/21/2026
📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research
Key Points
- The paper identifies a “Provenance Gap” in clinical AI, where frontier LLMs often generate plausible but fabricated citations, failing to provide clinically relevant PubMed identifiers without explicit prompting.
- In evaluations across rare neuromuscular disease scenarios, even the best LLM produced only 15.3% relevant PMIDs when asked to cite, and many citations pointed to unrelated publications.
- The authors propose HEG-TKG (Hierarchical Evidence-Grounded Temporal Knowledge Graphs), which grounds clinical claims in a temporally structured evidence graph built from 4,512 PubMed records plus curated sources and disease-trajectory milestones.
- In a controlled three-arm comparison, HEG-TKG preserved baseline clinical feature coverage while achieving 100% evidence verifiability using 203 inline citations, outperforming guideline-RAG (zero verifiable citations) and citation-based distinctions by LLM judges.
- A counterfactual test suggests HEG-TKG is highly robust to injected clinical errors (80% resistance) and can reliably detect issues via citation trace, with on-premise deployment using open-source models to keep patient data within institutional infrastructure.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

A practical guide to getting comfortable with AI coding tools
Dev.to

Every time a new model comes out, the old one is obsolete of course
Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆
Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)
Dev.to

🚀 Major BrowserAct CLI Update
Dev.to