fastml: Guarded Resampling Workflows for Safer Automated Machine Learning in R
arXiv cs.LG / 4/8/2026
💬 OpinionTools & Practical UsageModels & Research
Key Points
- The paper explains that preprocessing leakage can occur when data-dependent transformations (like scaling or imputation) are estimated before resampling, which can artificially inflate model performance estimates.
- It introduces fastml, an R package offering leakage-aware, single-call machine learning via “guarded resampling,” where preprocessing is re-estimated within each resample and then applied only to that fold’s assessment data.
- fastml supports grouped and time-ordered resampling, blocks high-risk preprocessing configurations, audits recipes for external dependencies, and uses sandboxed execution plus integrated model explanation.
- In evaluations, a Monte Carlo simulation shows that global preprocessing can substantially overstate performance compared with fold-local guarded resampling.
- The authors report that fastml achieves held-out performance comparable to tidymodels under matched specifications while simplifying orchestration and enabling consistent survival-model benchmarking through one unified interface.



