AI Navigate

CTG-DB: An Ontology-Based Transformation of ClinicalTrials.gov to Enable Cross-Trial Drug Safety Analyses

arXiv cs.CL / 3/18/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • CTG-DB ingests the complete ClinicalTrials.gov XML archive and outputs a relational database aligned to MedDRA terminology to standardize adverse event data for pharmacovigilance.
  • The pipeline preserves arm-level denominators and represents placebo and comparator arms to enable cross-trial safety analyses.
  • Adverse event terminology is normalized using deterministic exact and fuzzy matching to ensure transparent, reproducible mappings across trials.
  • The framework enables concept-level retrieval and cross-trial aggregation for scalable placebo-referenced safety analyses and integration into downstream pharmacovigilance signal detection.
  • CTG-DB is open-source, facilitating integration of clinical trial evidence into pharmacovigilance workflows and reproducible safety analytics.

Abstract

ClinicalTrials.gov (CT.gov) is the largest publicly accessible registry of clinical studies, yet its registry-oriented architecture and heterogeneous adverse event (AE) terminology limit systematic pharmacovigilance (PV) analytics. AEs are typically recorded as investigator-reported text rather than standardized identifiers, requiring manual reconciliation to identify coherent safety concepts. We present the ClinicalTrials.gov Transformation Database (CTG-DB), an open-source pipeline that ingests the complete CT.gov XML archive and produces a relational database aligned to standardized AE terminology using the Medical Dictionary for Regulatory Activities (MedDRA). CTG-DB preserves arm-level denominators, represents placebo and comparator arms, and normalizes AE terminology using deterministic exact and fuzzy matching to ensure transparent and reproducible mappings. This framework enables concept-level retrieval and cross-trial aggregation for scalable placebo-referenced safety analyses and integration of clinical trial evidence into downstream PV signal detection.