XITE: Cross-lingual Interpolation for Transfer using Embeddings

arXiv cs.CL / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces XITE, an embedding-based data augmentation method to improve cross-lingual transfer for multilingual language models using unlabeled text from low-resource target languages.
XITE finds an English counterpart in a task-specific corpus via embedding similarity, copies the label, and then interpolates source and target embeddings to generate synthetic training data for fine-tuning.
It further projects target text into a language-rich subspace using linear discriminant analysis (LDA) before interpolation, which boosts performance.
Experiments on XLM-R across multiple low-resource languages (e.g., Korean, Arabic, Urdu, Hindi) show large gains—up to 35.91% for sentiment analysis and up to 81.16% for natural language inference.
The approach also helps adaptation without significant forgetting, preserving performance on high-resource languages while improving transfer to low-resource ones.

Abstract

Facilitating cross-lingual transfer in multilingual language models remains a critical challenge. Towards this goal, we propose an embedding-based data augmentation technique called XITE. We start with unlabeled text from a low-resource target language, identify an English counterpart in a task-specific training corpus using embedding-based similarities and adopt its label. Next, we perform a simple interpolation of the source and target embeddings to create synthetic data for task-specific fine-tuning. Projecting the target text into a language-rich subspace using linear discriminant analysis (LDA), prior to interpolation, further boosts performance. Our cross-lingual embedding-based augmentation technique XITE yields significant improvements of up to 35.91% for sentiment analysis and up to 81.16% for natural language inference, using XLM-R, for a diverse set of target languages including Korean, Arabic, Urdu and Hindi. Apart from boosting cross-lingual transfer, adaptation using XITE also safeguards against forgetting and maintains task performance on the high-resource language.

LLMs will be a commodity

Reddit r/artificial

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Dev.to

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Reddit r/LocalLLaMA

Dex lands $5.3M to grow its AI-driven talent matching platform

Tech.eu

AI Citation Registry: Why Daily Updates Leave No Time for Data Structuring

Dev.to

XITE: Cross-lingual Interpolation for Transfer using Embeddings

Key Points

Abstract

Related Articles

LLMs will be a commodity

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Dex lands $5.3M to grow its AI-driven talent matching platform

AI Citation Registry: Why Daily Updates Leave No Time for Data Structuring

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer