AI Navigate

[D] Releasing a professional MQM-annotated MT dataset (16 lang pairs, 48 annotators)

Reddit r/MachineLearning / 3/17/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • The article reports the open-sourcing of a professional MQM-annotated MT dataset, featuring 362 translation segments across 16 language pairs and 48 professional linguists (not crowdsourced).
  • It uses full MQM error annotations with category, severity, and span, and includes multiple annotators per segment to enable inter-annotator agreement analysis.
  • The methodology follows WMT guidelines, achieving Kendall's tau of 0.317 for inter-annotator agreement, about 2.6x higher than typical WMT campaigns, highlighting the value of consistent annotator training.
  • The dataset is hosted on Hugging Face (alconost/mqm-translation-gold), with an open invitation for questions and feedback on the annotation process.

Hey all,

We've been doing translation quality evaluation work and decided to open-source one of our annotated datasets. Most MT test sets out there have either crowdsourced (noisy) annotations or are locked behind paywalls - we wanted to put something out with proper professional linguist annotations.

What's in it:

  • 362 translation segments
  • 16 language pairs
  • 48 professional linguists (not crowdsourced)
  • Full MQM error annotations (category, severity, span)
  • Multiple annotators per segment for IAA analysis

The methodology follows WMT guidelines - same error typology, same severity levels. We hit Kendall's τ = 0.317 on inter-annotator agreement, which is ~2.6x what typical WMT campaigns report. Not saying we're special, just that consistent annotator training seems to matter a lot.

Dataset: https://huggingface.co/datasets/alconost/mqm-translation-gold

Happy to answer questions about the annotation process or methodology - and if anyone digs in and spots issues with the data, we'd genuinely want to know.

submitted by /u/ritis88
[link] [comments]