Large Language Models as Annotators for Machine Translation Quality Estimation
arXiv cs.CL / 3/12/2026
💬 OpinionModels & Research
Key Points
- LLMs are proposed as generators of MQM-style annotations to train MT quality estimation models, addressing the high inference costs of using LLMs directly.
- The paper introduces a simplified MQM scheme limited to top-level categories and a GPT-4o-based prompt framework named PPbMQM.
- Results show the LLM-generated annotations correlate well with human annotations and that training COMET on them yields competitive segment-level QE performance for Chinese-English and English-German.
- This approach enables more cost-effective MTQE pipelines by leveraging LLMs for annotation rather than inference during deployment.
Related Articles

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA
QwenDean-4B | fine-tuned SLM for UIGen; our first attempt, looking for feedback!
Reddit r/LocalLLaMA
acestep.cpp: portable C++17 implementation of ACE-Step 1.5 music generation using GGML. Runs on CPU, CUDA, ROCm, Metal, Vulkan
Reddit r/LocalLLaMA

**Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding**
Hugging Face Blog

Newest GPU server in the lab! 72gb ampere vram!
Reddit r/LocalLLaMA