XITE: Cross-lingual Interpolation for Transfer using Embeddings
arXiv cs.CL / 4/28/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces XITE, an embedding-based data augmentation method to improve cross-lingual transfer for multilingual language models using unlabeled text from low-resource target languages.
- XITE finds an English counterpart in a task-specific corpus via embedding similarity, copies the label, and then interpolates source and target embeddings to generate synthetic training data for fine-tuning.
- It further projects target text into a language-rich subspace using linear discriminant analysis (LDA) before interpolation, which boosts performance.
- Experiments on XLM-R across multiple low-resource languages (e.g., Korean, Arabic, Urdu, Hindi) show large gains—up to 35.91% for sentiment analysis and up to 81.16% for natural language inference.
- The approach also helps adaptation without significant forgetting, preserving performance on high-resource languages while improving transfer to low-resource ones.
Related Articles
LLMs will be a commodity
Reddit r/artificial
Indian Developers: How to Build AI Side Income with $0 Capital in 2026
Dev.to

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally
Reddit r/LocalLLaMA

Dex lands $5.3M to grow its AI-driven talent matching platform
Tech.eu
AI Citation Registry: Why Daily Updates Leave No Time for Data Structuring
Dev.to