CoTEvol: Self-Evolving Chain-of-Thoughts for Data Synthesis in Mathematical Reasoning

arXiv cs.AI / 4/17/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces CoTEvol, a genetic evolutionary framework that treats Chain-of-Thought (CoT) generation as a population-based search over reasoning trajectories.
CoTEvol evolves candidate CoT trajectories using reflective global crossover at the trajectory level and uncertainty-guided local mutation at the step level, aiming for both holistic recombination and detailed refinement.
It uses lightweight, task-aware fitness functions to steer the evolutionary process toward reasoning that is both accurate and diverse.
Experiments on math tasks show over 30% improvement in successful correct-CoT synthesis, greater structural diversity, and better efficiency than prior distillation and self-synthesis methods.
LLMs trained on CoTEvol-generated evolutionary CoT data achieve an average 6.6% gain across eight math benchmarks, indicating the approach can scale to improve mathematical reasoning performance.

Abstract

Large Language Models (LLMs) exhibit strong mathematical reasoning when trained on high-quality Chain-of-Thought (CoT) that articulates intermediate steps, yet costly CoT curation hinders further progress. While existing remedies such as distillation from stronger LLMs and self-synthesis based on test-time search alleviate this issue, they often suffer from diminishing returns or high computing overhead.In this work, we propose CoTEvol, a genetic evolutionary framework that casts CoT generation as a population-based search over reasoning trajectories.Candidate trajectories are iteratively evolved through reflective global crossover at the trajectory level and local mutation guided by uncertainty at the step level, enabling holistic recombination and fine-grained refinement. Lightweight, task-aware fitness functions are designed to guide the evolutionary process toward accurate and diverse reasoning. Empirically, CoTEvol improves correct-CoT synthesis success by over 30% and enhances structural diversity, with markedly improved efficiency. LLMs trained on these evolutionary CoT data achieve an average gain of 6.6% across eight math benchmarks, outperforming previous distillation and self-synthesis approaches. These results underscore the promise of evolutionary CoT synthesis as a scalable and effective method for mathematical reasoning tasks.