Rethinking Easy-to-Hard: Limits of Curriculum Learning in Post-Training for Deductive Reasoning
arXiv cs.CL / 3/31/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper systematically tests curriculum learning (CL) for post-training of LLMs on synthetic arithmetic and logical benchmarks where difficulty is defined by reasoning complexity rather than surface proxies.
- Contrary to the intuition that ordering from easy to hard should improve generalization in compositional/deductive reasoning, the study finds no robust accuracy or response-length gains from difficulty-based example sequencing versus random sampling.
- The negative effect holds across multiple model families and curriculum schedules, indicating the result is not dependent on a specific architecture or curriculum design.
- Findings persist across both supervised fine-tuning (SFT) and reinforcement learning (RL) post-training, suggesting limited practical value of CL ordering for compositional generalization in this setting.
- The authors conclude that, for deductive reasoning post-training, the specific order of training examples appears to play a negligible role in achieving compositional generalization, challenging common CL practices.
Related Articles
Why AI agent teams are just hoping their agents behave
Dev.to

Harness as Code: Treating AI Workflows Like Infrastructure
Dev.to

How to Make Claude Code Better at One-Shotting Implementations
Towards Data Science

The Crypto AI Agent Stack That Costs $0/Month to Run
Dev.to

Bag of Freebies for Training Object Detection Neural Networks
Dev.to