fastml: Guarded Resampling Workflows for Safer Automated Machine Learning in R

arXiv cs.LG / 4/8/2026

💬 OpinionTools & Practical UsageModels & Research

共有:

Key Points

The paper explains that preprocessing leakage can occur when data-dependent transformations (like scaling or imputation) are estimated before resampling, which can artificially inflate model performance estimates.
It introduces fastml, an R package offering leakage-aware, single-call machine learning via “guarded resampling,” where preprocessing is re-estimated within each resample and then applied only to that fold’s assessment data.
fastml supports grouped and time-ordered resampling, blocks high-risk preprocessing configurations, audits recipes for external dependencies, and uses sandboxed execution plus integrated model explanation.
In evaluations, a Monte Carlo simulation shows that global preprocessing can substantially overstate performance compared with fold-local guarded resampling.
The authors report that fastml achieves held-out performance comparable to tidymodels under matched specifications while simplifying orchestration and enabling consistent survival-model benchmarking through one unified interface.

Abstract

Preprocessing leakage arises when scaling, imputation, or other data-dependent transformations are estimated before resampling, inflating apparent performance while remaining hard to detect. We present fastml, an R package that provides a single-call interface for leakage-aware machine learning through guarded resampling, where preprocessing is re-estimated inside each resample and applied to the corresponding assessment data. The package supports grouped and time-ordered resampling, blocks high-risk configurations, audits recipes for external dependencies, and includes sandboxed execution and integrated model explanation. We evaluate fastml with a Monte Carlo simulation contrasting global and fold-local normalization, a usability comparison with tidymodels under matched specifications, and survival benchmarks across datasets of different sizes. The simulation demonstrates that global preprocessing substantially inflates apparent performance relative to guarded resampling. fastml matched held-out performance obtained with tidymodels while reducing workflow orchestration, and it supported consistent benchmarking of multiple survival model classes through a unified interface.

Black Hat USA

AI Business

Black Hat Asia

AI Business

[N] Just found out that Milla Jovovich is a dev, invested in AI, and just open sourced a project

Reddit r/MachineLearning

Context Windows Are Getting Absurd — And That's a Good Thing

Dev.to

Every AI Agent Registry in 2026, Compared

Dev.to

fastml: Guarded Resampling Workflows for Safer Automated Machine Learning in R

Key Points

Abstract

Related Articles

Black Hat USA

Black Hat Asia

[N] Just found out that Milla Jovovich is a dev, invested in AI, and just open sourced a project

Context Windows Are Getting Absurd — And That's a Good Thing

Every AI Agent Registry in 2026, Compared

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer