Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution
arXiv cs.CL / 4/7/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies co-evolutionary self-play for LLM curriculum learning, where one model proposes math problems and another solves them, and shows that training can suffer from “diversity collapse” where the proposer converges to a narrow, reward-satisfying problem distribution.
- It introduces “vocabulary dropout,” a hard, non-stationary random masking of the proposer’s output logits during both policy training and curriculum generation to keep the proposer from locking into fixed token sequences.
- Experiments training Qwen3-4B and Qwen3-8B via R-Zero on mathematical reasoning indicate that vocabulary dropout preserves proposer diversity across lexical, semantic, and functional metrics throughout training.
- The approach improves the solver by an average of +4.4 points for the 8B model, with the largest gains on competition-level benchmarks.
- The authors argue that adding explicit action-space constraints—analogous to game rules in classical self-play—can sustain productive co-evolution and make the curriculum informative for the solver.
Related Articles

Black Hat Asia
AI Business
[R] The ECIH: Model Modeling Agentic Identity as an Emergent Relational State [R]
Reddit r/MachineLearning
Google DeepMind Unveils Project Genie: The Dawn of Infinite AI-Generated Game Worlds
Dev.to
Artificial Intelligence and Life in 2030: The One Hundred Year Study onArtificial Intelligence
Dev.to
Stop waiting for Java to rebuild! AI IDEs + Zero-Latency Hot Reload = Magic
Dev.to