Vocabulary shapes cross-lingual variation of word-order learnability in language models
arXiv cs.AI / 3/23/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The study investigates cross-lingual learnability of word order by pretraining transformer language models on synthetic word-order variants of natural languages.
- It finds that greater word-order irregularity raises model surprisal, indicating reduced learnability.
- Sentence reversal affects learnability only weakly, suggesting limited sensitivity to certain word-order perturbations.
- The authors show that vocabulary structure (word and subword inventory) predicts surprisal better than coarse free- vs fixed-order language classifications, making vocabulary the key driver of cross-lingual learnability.
Related Articles

Interactive Web Visualization of GPT-2
Reddit r/artificial
Stop Treating AI Interview Fraud Like a Proctoring Problem
Dev.to
[R] Causal self-attention as a probabilistic model over embeddings
Reddit r/MachineLearning
The 5 software development trends that actually matter in 2026 (and what they mean for your startup)
Dev.to
InVideo AI Review: Fast Finished
Dev.to