Adaptive Gaussian Process Search for Simulation-Based Sample Size Estimation in Clinical Prediction Models: Validation of the pmsims R Package

arXiv stat.ML / 3/26/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The article presents and validates pmsims, an R package that uses Gaussian process surrogate modeling to estimate sample sizes for clinical prediction models via a flexible simulation-based framework.
In simulation studies across binary, continuous, and survival outcomes, the Gaussian process-based adaptive search engine delivered the most stable sample size recommendations, especially under low-signal and high-dimensional conditions.
Benchmarking against existing analytical (pmsampsize) and simulation-based (samplesizedev) methods showed pmsims achieving performance close to target criteria while reducing computational burden.
The authors report that the best pmsims approach can match simulation-based approaches in difficult scenarios and generally outperform purely analytical methods, while requiring fewer model evaluations than non-adaptive simulation strategies.

Abstract

Background: Determining an adequate sample size is essential for developing reliable and generalisable clinical prediction models, yet practical guidance on selecting appropriate methods remains limited. Existing analytical and simulation-based approaches often rely on restrictive assumptions and focus on mean-based criteria. We present and validate pmsims, an R package that uses Gaussian process surrogate modelling to provide a flexible and computationally efficient simulation-based framework for sample size determination across diverse prediction settings. Methods: We conducted a comprehensive simulation study with two aims. First, we compared three search engines implemented in pmsims: a Gaussian process-based adaptive method, a deterministic bisection method, and a hybrid approach, across binary, continuous, and survival outcomes. Second, we benchmarked the best-performing pmsims engine against existing analytical (pmsampsize) and simulation-based (samplesizedev) methods, evaluating recommended sample sizes, computational time, and achieved performance on large independent validation datasets. Results: The Gaussian process-based method consistently produced the most stable sample size estimates, particularly in low-signal, high-dimensional settings. In benchmarking, pmsims achieved performance close to prespecified targets across all outcome types, matching simulation-based approaches and outperforming analytical methods in more challenging scenarios. Conclusions: pmsims provides an efficient and flexible framework for principled sample size planning in clinical prediction modelling, requiring fewer model evaluations than non-adaptive simulation approaches.

Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets

Dev.to

Mercor competitor Deccan AI raises $25M, sources experts from India

Dev.to

How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)

Dev.to

How Should Students Document AI Usage in Academic Work?

Dev.to

I built a PWA fitness tracker with AI that supports 86 sports — as a solo developer

Dev.to

Adaptive Gaussian Process Search for Simulation-Based Sample Size Estimation in Clinical Prediction Models: Validation of the pmsims R Package

Key Points

Abstract

Related Articles

Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets

Mercor competitor Deccan AI raises $25M, sources experts from India

How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)

How Should Students Document AI Usage in Academic Work?

I built a PWA fitness tracker with AI that supports 86 sports — as a solo developer

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer