What Kind of Language is Easy to Language-Model Under Curriculum Learning?

arXiv cs.CL / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The study investigates how typological tendencies of human languages (from rare to common feature combinations) relate to what language models learn under different training setups.
  • It asks whether the models’ own learning bias is sufficient to reproduce observed cross-linguistic patterns, and extends the analysis by explicitly adding the training scenario dimension.
  • As an initial step, the researchers test curriculum learning by training on simpler sentences first rather than using randomly ordered inputs.
  • They find that curriculum learning substantially changes the “apparent inductive bias” of language models, altering how those models exhibit typological effects.
  • Overall, the work suggests that reported language-model biases may depend strongly on data ordering and learning curriculum rather than reflecting only model-intrinsic tendencies.

Abstract

Many of the thousands of attested languages share common configurations of features, creating a spectrum from typologically very rare (e.g., object-verb-subject word order) or impossible languages to very common combinations of features (e.g., subject-object-verb word order). One central question is under what conditions such typological tendencies can be predicted, and specifically whether the learning bias of language models (LMs) is sufficient to reproduce such patterns. In this study, we add one dimensionality to such analysis -- the learning scenario for LMs -- to explore its interaction with the inductive bias of LMs. Specifically, as a first study, we examine the effect of curriculum learning (CL), as a developmentally motivated learning scenario, i.e., starting with simpler sentences rather than randomly-ordered input. We expand existing LM-based exploration (El-Naggar et al., 2025a,b) with a simple CL variant and find that CL substantially impacts the apparent inductive bias of LMs.