Toward domain-specific machine translation and quality estimation systems
arXiv cs.AI / 3/27/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The dissertation argues that machine translation (MT) and quality estimation (QE) degrade when moving from general to specialized domains and focuses on data-driven adaptation strategies to address this gap.
- It proposes similarity-based in-domain data selection for MT, showing that small targeted subsets can outperform much larger generic datasets while reducing computational cost.
- For QE, it introduces a staged training pipeline that combines domain adaptation with lightweight data augmentation and improves results across domains, languages, resource settings, including zero-shot and cross-lingual cases.
- It finds that subword tokenization and vocabulary alignment are critical during fine-tuning, where mismatched tokenization-vocabulary configurations destabilize training and hurt translation quality.
- It also presents a QE-guided in-context learning approach for large language models that selects examples to improve translation quality without parameter updates and can operate in a reference-free setup.
広告
Related Articles

Got My 39-Agent System Audited Live. Here's What the Maturity Scorecard Revealed.
Dev.to

The Redline Economy
Dev.to

$500 GPU outperforms Claude Sonnet on coding benchmarks
Dev.to

From Scattershot to Sniper: AI for Hyper-Personalized Media Lists
Dev.to

The LiteLLM Supply Chain Attack: A Wake-Up Call for AI Infrastructure
Dev.to