Incentivizing Parametric Knowledge via Reinforcement Learning with Verifiable Rewards for Cross-Cultural Entity Translation

arXiv cs.CL / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses the difficulty of cross-cultural entity translation with LLMs, noting that models often produce literal or phonetic renderings instead of context-appropriate translations.
  • It introduces EA-RLVR (Entity-Anchored Reinforcement Learning with Verifiable Rewards), a training framework that leverages parametric knowledge using verifiable entity-level rewards without relying on external knowledge bases.
  • EA-RLVR uses entity-level reward anchoring and lightweight structural gates to stabilize reinforcement learning and encourage robust reasoning rather than simple imitation of reference outputs.
  • Experiments on XC-Translate show that training with only 7k samples improves Qwen3-14B entity translation accuracy from 23.66% to 31.87% on a 50k test set with entirely unseen entities.
  • The approach also transfers to general translation quality, achieving +1.35 XCOMET on WMT24++ and +1.59 with extended optimization, with analyses linking gains to sampling efficiency and a more stable optimization landscape.

Abstract

Cross-cultural entity translation remains challenging for large language models (LLMs) as literal or phonetic renderings are usually yielded instead of culturally appropriate translations in context. However, relevant knowledge may already be encoded in model parameters during large-scale pre-training. To incentivize the effective use of parametric knowledge, we propose EA-RLVR (Entity-Anchored Reinforcement Learning with Verifiable Rewards), a training framework that optimizes cross-cultural entity translation without relying on external knowledge bases. EA-RLVR anchors supervision on a verifiable, entity-level reward signal and incorporates lightweight structural gates to stabilize optimization. This design steers the model toward learning a robust reasoning process rather than merely imitating reference translations. We evaluate EA-RLVR on XC-Translate and observe consistent improvements in both entity translation accuracy and out-of-domain generalization. Specifically, training on merely 7k samples boosts Qwen3-14B's entity translation accuracy from 23.66\% to 31.87\% on a 50k test set comprising entirely unseen entities. The learned entity translation ability also transfers to general translation, yielding +1.35 XCOMET on WMT24++, which scales to +1.59 with extended optimization. Extensive analyses of pass@k dynamics and reward formulations attribute these gains to superior sampling efficiency and a stable optimization landscape.