LabelSets — open quality standard for AI training data (LQS v3.1) [D]

Reddit r/MachineLearning / 4/27/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

LabelSets introduces LQS v3.1, an open quality standard for AI/ML training data that includes dataset rating via multiple scoring oracles across several algorithm families.
The approach uses conformal prediction intervals to estimate downstream F1 performance, along with Ed25519-signed certificates and a contamination check against 40+ public evaluations.
Users can freely audit datasets by pasting any Hugging Face dataset URL, and can verify certificates publicly via an unauthenticated verification API endpoint.
The calibration corpus is around 1,000 datasets today and is targeted to reach about 10,000 by Q3 2026, with certification explicitly indicating when calibration is sparse rather than overstating confidence.
The authors invite feedback on label dimensions, oracle agreement statistics (Cohen and Fleiss κ), and conformal calibration, and publish a CC BY 4.0 methodology paper with the full specification.

Built a third-party quality rating system for ML datasets. Multi-oracle (7 scorers across 5 algorithm families), conformal prediction intervals on downstream F1, Ed25519-signed certs, and a contamination check against 40+ public evals (MMLU, HumanEval, GSM8K, MedQA, LegalBench, etc.).

Methodology paper, CC BY 4.0: https://labelsets.ai/paper

Free audit (paste any HF dataset URL): https://labelsets.ai/rate

Public verification API, no auth: GET /api/verify-lqs-cert/:hash

Calibration corpus is at ~1,000 datasets and growing toward 10,000 by Q3 2026 — where calibration is thin, the cert says so out loud rather than fabricating confidence.

Happy to take feedback on the dimension list, the oracle agreement math (Cohen + Fleiss κ reporting), or the conformal prediction calibration. The methodology paper has the full spec — anywhere we got the math wrong, we want to know.

submitted by /u/plomii
[link] [comments]