AI Navigate

Why Better Cross-Lingual Alignment Fails for Better Cross-Lingual Transfer: Case of Encoders

arXiv cs.CL / 3/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper shows that better cross-lingual alignment often fails to improve token-level downstream transfer despite increasing embedding similarity.
  • It analyzes four XLM-R encoder models aligned on different language pairs and fine-tuned for POS tagging or sentence classification using representational analyses like embedding distances, gradient similarities, and gradient magnitudes.
  • The results reveal that embedding distances are unreliable predictors of task performance and that alignment and task gradients are largely orthogonal, meaning optimizing one objective may contribute little to the other.
  • Based on these insights, the authors provide practical guidelines for combining cross-lingual alignment with task-specific fine-tuning and emphasize careful loss selection.

Abstract

Better cross-lingual alignment is often assumed to yield better cross-lingual transfer. However, explicit alignment techniques -- despite increasing embedding similarity -- frequently fail to improve token-level downstream performance. In this work, we show that this mismatch arises because alignment and downstream task objectives are largely orthogonal, and because the downstream benefits from alignment vary substantially across languages and task types. We analyze four XLM-R encoder models aligned on different language pairs and fine-tuned for either POS Tagging or Sentence Classification. Using representational analyses, including embedding distances, gradient similarities, and gradient magnitudes for both task and alignment losses, we find that: (1) embedding distances alone are unreliable predictors of improvements (or degradations) in task performance and (2) alignment and task gradients are often close to orthogonal, indicating that optimizing one objective may contribute little to optimizing the other. Taken together, our findings explain why ``better'' alignment often fails to translate into ``better'' cross-lingual transfer. Based on these insights, we provide practical guidelines for combining cross-lingual alignment with task-specific fine-tuning, highlighting the importance of careful loss selection.