Large Language Models as Annotators for Machine Translation Quality Estimation
arXiv cs.CL / 3/12/2026
💬 OpinionModels & Research
Key Points
- LLMs are proposed as generators of MQM-style annotations to train MT quality estimation models, addressing the high inference costs of using LLMs directly.
- The paper introduces a simplified MQM scheme limited to top-level categories and a GPT-4o-based prompt framework named PPbMQM.
- Results show the LLM-generated annotations correlate well with human annotations and that training COMET on them yields competitive segment-level QE performance for Chinese-English and English-German.
- This approach enables more cost-effective MTQE pipelines by leveraging LLMs for annotation rather than inference during deployment.
Related Articles

Math needs thinking time, everyday knowledge needs memory, and a new Transformer architecture aims to deliver both
THE DECODER
Kreuzberg v4.5.0: We loved Docling's model so much that we gave it a faster engine
Reddit r/LocalLLaMA
Today, what hardware to get for running large-ish local models like qwen 120b ?
Reddit r/LocalLLaMA
Running mistral locally for meeting notes and it's honestly good enough for my use case
Reddit r/LocalLLaMA
[D] Single-artist longitudinal fine art dataset spanning 5 decades now on Hugging Face — potential applications in style evolution, figure representation, and ethical training data
Reddit r/MachineLearning