From Evidence-Based Medicine to Knowledge Graph: Retrieval-Augmented Generation for Sports Rehabilitation and a Domain Benchmark

arXiv cs.CL / 3/27/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that existing medical RAG systems miss key evidence-based medicine (EBM) requirements, specifically PICO alignment and evidence hierarchy during reranking.
It introduces SR-RAG, an EBM-adapted GraphRAG framework that incorporates the PICO framework into knowledge-graph construction and retrieval.
The study proposes Bayesian Evidence Tier Reranking (BETR) to calibrate ranking scores according to evidence grade without relying on predefined weights.
Experiments in sports rehabilitation show strong retrieval, faithfulness, and semantic metrics, including 0.812 evidence recall@10, 0.819 answer faithfulness, and 0.788 PICOT match accuracy.
The authors release a large sports-rehabilitation knowledge graph (357,844 nodes, 371,226 edges) plus a benchmark dataset of 1,637 QA pairs, supported by clinician Likert ratings and human-verified evaluation.

Abstract

Current medical retrieval-augmented generation (RAG) approaches overlook evidence-based medicine (EBM) principles, leading to two key gaps: (1) the lack of PICO alignment between queries and retrieved evidence, and (2) the absence of evidence hierarchy considerations during reranking. We present SR-RAG, an EBM-adapted GraphRAG framework that integrates the PICO framework into knowledge graph construction and retrieval, and proposes Bayesian Evidence Tier Reranking (BETR) to calibrate ranking scores by evidence grade without predefined weights. Validated in sports rehabilitation, we release a knowledge graph (357,844 nodes, 371,226 edges) and a benchmark of 1,637 QA pairs. SR-RAG achieves 0.812 evidence recall@10, 0.830 nugget coverage, 0.819 answer faithfulness, 0.882 semantic similarity, and 0.788 PICOT match accuracy, substantially outperforming five baselines. Five expert clinicians rated the system 4.66--4.84 on a 5-point Likert scale, and system rankings are preserved on a human-verified gold subset (n=80).

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/27DailyView insight →

Black Hat Asia

AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

I shipped Google's TurboQuant as a vLLM plugin 72 hours after the paper — here's what nobody else tested

Dev.to

We built a governance layer for AI-assisted development (with runtime validation and real system)

Dev.to

No AI system using the forward inference pass can ever be conscious.

Reddit r/artificial

From Evidence-Based Medicine to Knowledge Graph: Retrieval-Augmented Generation for Sports Rehabilitation and a Domain Benchmark

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat Asia

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

I shipped Google's TurboQuant as a vLLM plugin 72 hours after the paper — here's what nobody else tested

We built a governance layer for AI-assisted development (with runtime validation and real system)

No AI system using the forward inference pass can ever be conscious.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer