CurEvo: Curriculum-Guided Self-Evolution for Video Understanding

arXiv cs.CV / 4/30/2026

📰 NewsModels & Research

Key Points

  • The paper proposes CurEvo, a curriculum-guided self-evolution framework aimed at improving autonomous video understanding without human annotations.
  • It addresses prior self-evolution approaches that suffer from weak optimization control and unstructured difficulty progression by dynamically regulating task difficulty, evaluation criteria, and data diversity based on model competence.
  • CurEvo implements a multi-dimensional adaptive QA system that jointly evolves question generation and answer evaluation across perception, recognition, and understanding dimensions to keep curriculum progression coherent and measurable.
  • Experiments across seven model backbones show consistent gains in benchmark accuracy and evaluator-based semantic scores on four VideoQA benchmarks.
  • Overall, the work reframes self-evolution as a feedback loop that aligns learning complexity with the model’s current capability, making improvement more reliable and structured.

Abstract

Recent advances in self-evolution video understanding frameworks have demonstrated the potential of autonomous learning without human annotations. However, existing methods often suffer from weakly controlled optimization and uncontrolled difficulty progression, as they lack structured guidance throughout the iterative learning process. To address these limitations, we propose CurEvo, a curriculum-guided self-evolution framework that introduces curriculum learning into self-evolution to achieve more structured and progressive model improvement. CurEvo dynamically regulates task difficulty, refines evaluation criteria, and balances data diversity according to model competence, forming a curriculum-guided feedback loop that aligns learning complexity with model capability. Built upon this principle, we develop a multi-dimensional adaptive QA framework that jointly evolves question generation and answer evaluation across perception, recognition, and understanding dimensions, ensuring coherent and measurable curriculum progression. Through this integration, CurEvo transforms weakly controlled self-evolution into a more structured learning process for autonomous video understanding. Across seven backbones, CurEvo consistently improves both benchmark accuracy and evaluator-based semantic score on four VideoQA benchmarks, validating the effectiveness of curriculum-guided self-evolution for video understanding.