Translationese as a Rational Response to Translation Task Difficulty

arXiv cs.CL / 3/13/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that translationese partly reflects cognitive load in the translation task, not solely production tendencies or socio-cultural factors.
It predicts observable translationese using quantifiable task-difficulty measures, including source-text and cross-lingual transfer components, via information-theoretic metrics based on LLM surprisal and supplemented by syntactic and semantic features.
Using a bidirectional English-German corpus with written and spoken subcorpora, results show cross-lingual transfer difficulty often explains translationese more than source-text complexity, especially for English-to-German.
In written mode, information-theoretic indicators match or outperform traditional features, while in spoken mode they provide no advantage; source-text syntactic complexity and translation-solution entropy emerge as the strongest predictors across language pairs and modes.

Abstract

Translations systematically diverge from texts originally produced in the target language, a phenomenon widely referred to as translationese. Translationese has been attributed to production tendencies (e.g. interference, simplification), socio-cultural variables, and language-pair effects, yet a unified explanatory account is still lacking. We propose that translationese reflects cognitive load inherent in the translation task itself. We test whether observable translationese can be predicted from quantifiable measures of translation task difficulty. Translationese is operationalised as a segment-level translatedness score produced by an automatic classifier. Translation task difficulty is conceptualised as comprising source-text and cross-lingual transfer components, operationalised mainly through information-theoretic metrics based on LLM surprisal, complemented by established syntactic and semantic alternatives. We use a bidirectional English-German corpus comprising written and spoken subcorpora. Results indicate that translationese can be partly explained by translation task difficulty, especially in English-to-German. For most experiments, cross-lingual transfer difficulty contributes more than source-text complexity. Information-theoretic indicators match or outperform traditional features in written mode, but offer no advantage in spoken mode. Source-text syntactic complexity and translation-solution entropy emerged as the strongest predictors of translationese across language pairs and modes.