Built a third-party quality rating system for ML datasets. Multi-oracle (7 scorers across 5 algorithm families), conformal prediction intervals on downstream F1, Ed25519-signed certs, and a contamination check against 40+ public evals (MMLU, HumanEval, GSM8K, MedQA, LegalBench, etc.).
Methodology paper, CC BY 4.0: https://labelsets.ai/paper
Free audit (paste any HF dataset URL): https://labelsets.ai/rate
Public verification API, no auth: GET /api/verify-lqs-cert/:hash
Calibration corpus is at ~1,000 datasets and growing toward 10,000 by Q3 2026 — where calibration is thin, the cert says so out loud rather than fabricating confidence.
Happy to take feedback on the dimension list, the oracle agreement math (Cohen + Fleiss κ reporting), or the conformal prediction calibration. The methodology paper has the full spec — anywhere we got the math wrong, we want to know.
[link] [comments]



