AI Navigate

Preventing Curriculum Collapse in Self-Evolving Reasoning Systems

arXiv cs.LG / 3/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Prism tackles diversity collapse in self-evolving reasoning by introducing a persistent diversity signal over semantic partitions and a Zone-of-Proximal-Development (ZPD) gate to preserve edge-of-solvability difficulty.
  • It encourages balanced exploration of underrepresented regions across iterations, addressing cross-iteration semantic coverage as a high-leverage axis for improving self-evolving reasoners.
  • On seven mathematical benchmarks, Prism achieves the highest accuracy on six tasks and yields gains of up to 3.98 absolute points on AMC and 3.68 on Minerva Math compared with baselines.
  • The work results in the Prism-Math dataset with 100k mathematical questions and the authors release code, dataset, and models to the community.

Abstract

Self-evolving reasoning frameworks let LLMs improve their reasoning capabilities by iteratively generating and solving problems without external supervision, using verifiable rewards. Ideally, such systems are expected to explore a diverse problem space and propose new challenges of high learning value. While prior work has largely focused on solver-side optimisation and verification, recent evidence suggests that self-evolving systems can exhibit diversity collapse in posing new problems after just a few iterations, even when surface-level variation is preserved. We introduce Prism, a question-centric self-evolution method that directly tackles this collapse. Prism defines a persistent diversity signal over an embedding-induced semantic partition of mathematical problems and uses it to encourage balanced exploration of underrepresented regions across iterations. This coverage signal is combined with a Zone-of-Proximal-Development (ZPD) gate to preserve edge-of-solvability difficulty. Evaluated on seven widely used mathematical reasoning benchmarks against five self-evolving baselines, Prism achieves the highest accuracy on six out of seven tasks, achieving gains of +3.98 absolute points over R-Zero on AMC and +3.68 on Minerva Math. Prism also generates semantically diverse and challenging questions across iterations, resulting in the construction of the Prism-Math dataset comprising 100k mathematical questions. These results demonstrate that cross-iteration semantic coverage is a high-leverage and under-explored axis for building more capable self-evolving reasoners. We release the code, dataset, and models to facilitate further research.