Text Style Transfer with Machine Translation for Graphic Designs

arXiv cs.AI / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses the challenge of translating text for graphic designs while preserving the original text styling, which requires highly accurate word alignment between source and translated text.
  • It proposes three new word-alignment methods for text style transfer built on commercially available NMT and LLM translation technologies, using custom input/output tags and a hybrid NMT+LLM strategy with unigram mappings.
  • The authors evaluate alignment quality by comparing the proposed methods against an attention-head baseline to assess suitability for real graphic design workflows.
  • Results indicate that the strong attention-head baseline is more accurate than standalone LLM or NMT approaches and is comparable to the hybrid NMT+LLM method.
  • Overall, the study suggests that better attention-based alignment may be critical for reliable styling preservation in multilingual graphic design contexts.

Abstract

Globalization of graphic designs such as those used in marketing materials and magazines is increasingly important for communication to broad audiences. To accomplish this, the textual content in the graphic designs needs to be accurately translated and have the text styling preserved in order to fit visually into the design. Preserving text styling requires high accuracy word alignment between the original and the translated text. The problem of word alignment between source and translated text is long known. The industry standards for extracting word alignments are defined by Giza++ and attention probabilities from neural machine translation (NMT) models. In this paper, we explore three new methods to tackle the word alignment problem for transferring text styles from the source to the translated text. The proposed methods are developed on top of commercially available NMT and LLM translation technologies. They include: NMT with custom input and output tags for text styling; LLM with custom input and output tags; a hybrid with NMT for translation followed by an LLM with use of unigram mappings. To analyze the performance of these solutions, their alignment results are compared with the results of an attention head approach to gauge their usability in graphic design applications. Interestingly, the attention head strong baseline proves more accurate than the LLM or NMT approach and on par with the hybrid NMT+LLM approach.