[P] preflight, a pre-training validator for PyTorch I built after losing 3 days to label leakage

Reddit r/MachineLearning / 3/15/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

The author built preflight, a CLI tool that runs before training to catch common silent issues that sabotage model performance.
It includes ten checks across fatal, warn, info severities, and exits with code 1 on fatal failures to block CI.
The tool targets issues like label leakage, NaNs, wrong channel ordering, dead gradients, class imbalance, and VRAM estimation.
The project is in early release (v0.1.1) with GitHub and PyPI pages, and the author is seeking feedback and contributions.

A few weeks ago I was working on a training run that produced garbage results.

No errors, no crashes, just a model that learned nothing. Three days later I found it. Label leakage between train and val. The model had been cheating the whole time.

So I built preflight. It's a CLI tool you run before training starts that catches the

silent stuff like NaNs, label leakage, wrong channel ordering, dead gradients, class imbalance, VRAM estimation. Ten checks total across fatal/warn/info severity tiers. Exits with code 1 on fatal failures so it can block CI.

pip install preflight-ml

preflight run --dataloader my_dataloader.py

It's very early — v0.1.1, just pushed it. I'd genuinely love feedback on what checks matter most to people, what I've missed, what's wrong with the current approach. If anyone wants to contribute a check or two that'd be even better as each one just needs a passing test, failing test, and a fix hint.

GitHub: https://github.com/Rusheel86/preflight

PyPI: https://pypi.org/project/preflight-ml/

Not trying to replace pytest or Deepchecks, just fill the gap between "my code runs" and "my training will actually work."

submitted by /u/Red_Egnival
[link] [comments]