ReflectMT: Internalizing Reflection for Efficient and High-Quality Machine Translation

arXiv cs.CL / 4/22/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • ReflectMT is a new machine translation approach that replaces “think-first-then-translate” with a more efficient “translate-first-think-later” paradigm.
  • The method uses a two-stage reinforcement learning process: first it improves reflection and refinement quality, then it trains the model to internalize what it learns from reflection.
  • After training, ReflectMT performs direct translation at inference time, producing high-quality outputs without any explicit multi-step reasoning traces.
  • Experiments on datasets including WMT24 show that first-pass translations outperform multi-step reasoning models like DeepSeek-R1 on both automatic metrics and GPT-based evaluation, while drastically reducing token usage (94.33%).
  • The work reports a 2.16-point gain in GPT-based translation quality evaluation, highlighting quality improvements alongside major inference efficiency benefits.

Abstract

Recent years have witnessed growing interest in applying Large Reasoning Models (LRMs) to Machine Translation (MT). Existing approaches predominantly adopt a "think-first-then-translate" paradigm. Although explicit reasoning trajectories significantly enhance translation quality, they incur prohibitive inference costs and latency. To address these limitations, we propose ReflectMT, a two-stage reflection internalization algorithm for machine translation that employs a "translate-first-think-later" paradigm. Our approach develops the model's "translate-reflect-refine" capability through reinforcement learning. In the first stage, we cultivate the model's capacity for high-quality reflection and refinement, thereby enhancing its semantic comprehension and task-specific knowledge. In the second stage, we train the model to internalize the knowledge acquired during reflection. As a result, during inference, ReflectMT operates in a direct translation mode, producing high-quality translations on the first attempt without any explicit reasoning steps. Experimental results on datasets such as WMT24 demonstrate that our model's first-pass translations during inference outperform multi-step reasoning LRMs such as DeepSeek-R1 in both automatic metrics and GPT-based evaluation, achieving a 2.16-point improvement in GPT-based translation quality evaluation while reducing token consumption by 94.33%.