XITE: Cross-lingual Interpolation for Transfer using Embeddings

arXiv cs.CL / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces XITE, an embedding-based data augmentation method to improve cross-lingual transfer for multilingual language models using unlabeled text from low-resource target languages.
  • XITE finds an English counterpart in a task-specific corpus via embedding similarity, copies the label, and then interpolates source and target embeddings to generate synthetic training data for fine-tuning.
  • It further projects target text into a language-rich subspace using linear discriminant analysis (LDA) before interpolation, which boosts performance.
  • Experiments on XLM-R across multiple low-resource languages (e.g., Korean, Arabic, Urdu, Hindi) show large gains—up to 35.91% for sentiment analysis and up to 81.16% for natural language inference.
  • The approach also helps adaptation without significant forgetting, preserving performance on high-resource languages while improving transfer to low-resource ones.

Abstract

Facilitating cross-lingual transfer in multilingual language models remains a critical challenge. Towards this goal, we propose an embedding-based data augmentation technique called XITE. We start with unlabeled text from a low-resource target language, identify an English counterpart in a task-specific training corpus using embedding-based similarities and adopt its label. Next, we perform a simple interpolation of the source and target embeddings to create synthetic data for task-specific fine-tuning. Projecting the target text into a language-rich subspace using linear discriminant analysis (LDA), prior to interpolation, further boosts performance. Our cross-lingual embedding-based augmentation technique XITE yields significant improvements of up to 35.91% for sentiment analysis and up to 81.16% for natural language inference, using XLM-R, for a diverse set of target languages including Korean, Arabic, Urdu and Hindi. Apart from boosting cross-lingual transfer, adaptation using XITE also safeguards against forgetting and maintains task performance on the high-resource language.