Self Paced Gaussian Contextual Reinforcement Learning

arXiv cs.LG / 3/26/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Self-Paced Gaussian Curriculum Learning (SPGL), a self-paced curriculum method for contextual reinforcement learning that uses a closed-form update for Gaussian context distributions instead of expensive inner-loop optimization.
  • By avoiding costly numerical procedures, SPGL aims to improve scalability in high-dimensional context spaces while retaining the sample efficiency and adaptability of prior self-paced curriculum approaches.
  • The authors provide theoretical convergence guarantees for SPGL.
  • Experiments on contextual RL benchmarks (e.g., Point Mass, Lunar Lander, Ball Catching) show SPGL matches or outperforms existing curriculum methods, particularly in hidden context settings, with more stable context distribution convergence.

Abstract

Curriculum learning improves reinforcement learning (RL) efficiency by sequencing tasks from simple to complex. However, many self-paced curriculum methods rely on computationally expensive inner-loop optimizations, limiting their scalability in high-dimensional context spaces. In this paper, we propose Self-Paced Gaussian Curriculum Learning (SPGL), a novel approach that avoids costly numerical procedures by leveraging a closed-form update rule for Gaussian context distributions. SPGL maintains the sample efficiency and adaptability of traditional self-paced methods while substantially reducing computational overhead. We provide theoretical guarantees on convergence and validate our method across several contextual RL benchmarks, including the Point Mass, Lunar Lander, and Ball Catching environments. Experimental results show that SPGL matches or outperforms existing curriculum methods, especially in hidden context scenarios, and achieves more stable context distribution convergence. Our method offers a scalable, principled alternative for curriculum generation in challenging continuous and partially observable domains.