Free tool I built to score dataset quality (LQS) — feedback welcome [D]

Reddit r/MachineLearning / 4/9/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • A free standalone tool has been released to compute a Label Quality Score (LQS) for uploaded datasets, returning a 0–100 rating broken down across seven quality dimensions.
  • The system provides actionable flags indicating specific factors that are degrading dataset quality.
  • It supports common machine-learning dataset formats including CSV, Parquet, JSONL, COCO JSON, and YOLO.
  • The developer is inviting professional dataset practitioners to validate whether the scoring methodology makes sense and to share feedback or discuss the approach.

We built a Label Quality Score (LQS) system for our dataset marketplace and opened it up as a free standalone tool.

Upload a dataset → get a 0–100 score broken down across 7 dimensions with specific flags for what's degrading quality.

Supports CSV, Parquet, JSONL, COCO JSON, YOLO — most common ML formats.

Link: labelsets.ai/quality-audit

Not trying to pitch anything, genuinely want to know if the scoring makes sense to people who work with datasets professionally. Happy to discuss the methodology in comments.

submitted by /u/plomii
[link] [comments]