Code Sharing In Prediction Model Research: A Scoping Review

arXiv cs.AI / 4/10/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • A scoping review of PubMed Central Open Access prediction model papers found that only 12.2% included code-sharing statements, though this increased over time to 15.8% in 2025.
  • Code sharing was more prevalent in studies citing TRIPOD+AI than in studies citing TRIPOD alone, with substantial variation across journals and countries.
  • The study used an LLM-assisted pipeline to extract code availability statements and evaluate repositories, revealing major heterogeneity in reproducibility-related features.
  • While most repositories included a README (80.5%), fewer specified dependencies (37.6%), constrained versions (21.6%), or used modular structure (42.4%), limiting reusability.
  • The results aim to support development of TRIPOD-Code, a reporting guideline extension that goes beyond “code availability” to require clearer expectations for documentation, dependencies, licensing, and executable structure.

Abstract

Analytical code is essential for reproducing diagnostic and prognostic prediction model research, yet code availability in the published literature remains limited. While the TRIPOD statements set standards for reporting prediction model methods, they do not define explicit standards for repository structure and documentation. This review quantifies current code-sharing practices to inform the development of TRIPOD-Code, a TRIPOD extension reporting guideline focused on code sharing. We conducted a scoping review of PubMed-indexed articles citing TRIPOD or TRIPOD+AI as of Aug 11, 2025, restricted to studies retrievable via the PubMed Central Open Access API. Eligible studies developed, updated, or validated multivariable prediction models. A large language model-assisted pipeline was developed to screen articles and extract code availability statements and repository links. Repositories were assessed with the same LLM against 14 predefined reproducibility-related features. Our code is made publicly available. Among 3,967 eligible articles, 12.2% included code sharing statements. Code sharing increased over time, reaching 15.8% in 2025, and was higher among TRIPOD+AI-citing studies than TRIPOD-citing studies. Sharing prevalence varied widely by journal and country. Repository assessment showed substantial heterogeneity in reproducibility features: most repositories contained a README file (80.5%), but fewer specified dependencies (37.6%; version-constrained 21.6%) or were modular (42.4%). In prediction model research, code sharing remains relatively uncommon, and when shared, often falls short of being reusable. These findings provide an empirical baseline for the TRIPOD-Code extension and underscore the need for clearer expectations beyond code availability, including documentation, dependency specification, licensing, and executable structure.