SPELL: Self-Play Reinforcement Learning for Evolving Long-Context Language Models
arXiv cs.CL / 3/16/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- SPELL is a multi-role self-play reinforcement learning framework that enables label-free optimization for long-context reasoning in LLMs by integrating a questioner, a responder, and a verifier within a single model.
- It uses an automated curriculum that gradually increases document length and an adaptive reward function to tailor question difficulty to the model's evolving capabilities, stabilizing training.
- Experiments on six long-context benchmarks show SPELL improves performance across diverse LLMs and outperforms equally sized models fine-tuned on annotated data, including a 7.6-point gain in pass@8 on Qwen3-30B-A3B-Thinking.
- The authors release the code at GitHub, enabling replication and broader experimentation.




