Polyglot Teachers: Evaluating Language Models for Multilingual Synthetic Data Generation
arXiv cs.CL / 4/14/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that using the largest available language model as a “teacher” for multilingual synthetic SFT data is often ad hoc and can produce low-quality data that hurts smaller student models.
- It introduces a multilingual evaluation approach (“Polyglot Score”) and reports experiments with 10 language models across 6 typologically diverse languages, generating 1.4M+ SFT examples and training 240 student models.
- Gemma 3 27B and Aya Expanse 32B are found to be consistently effective multilingual teacher models across different student base model families.
- The study finds teacher effectiveness is not well predicted by model scale alone; instead, intrinsic data qualities like prompt diversity, response length, and fluency explain most variance in data quality and correlate with student performance.
- The authors provide practical recommendations for teacher-student model pairing and strategies like translating from existing prompts or responding to them to improve synthetic data for less-resourced languages.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.



