ACDC: Adaptive Curriculum Planning with Dynamic Contrastive Control for Goal-Conditioned Reinforcement Learning in Robotic Manipulation

arXiv cs.RO / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces ACDC (Adaptive Curriculum Planning with Dynamic Contrastive Control) for goal-conditioned reinforcement learning in robotic manipulation, aiming to improve over experience-prioritization-based methods.
  • ACDC combines an Adaptive Curriculum (AC) planner that dynamically balances diversity-driven exploration and quality-driven exploitation using metrics like success rate and training progress.
  • The Dynamic Contrastive (DC) control component executes the planned curriculum via norm-constrained contrastive learning, using magnitude-guided experience selection to match the current learning focus.
  • Experiments on challenging robotic manipulation tasks report that ACDC outperforms state-of-the-art baselines in both sample efficiency and final task success rate.

Abstract

Goal-conditioned reinforcement learning has shown considerable potential in robotic manipulation; however, existing approaches remain limited by their reliance on prioritizing collected experience, resulting in suboptimal performance across diverse tasks. Inspired by human learning behaviors, we propose a more comprehensive learning paradigm, ACDC, which integrates multidimensional Adaptive Curriculum (AC) Planning with Dynamic Contrastive (DC) Control to guide the agent along a well-designed learning trajectory. More specifically, at the planning level, the AC component schedules the learning curriculum by dynamically balancing diversity-driven exploration and quality-driven exploitation based on the agent's success rate and training progress. At the control level, the DC component implements the curriculum plan through norm-constrained contrastive learning, enabling magnitude-guided experience selection aligned with the current curriculum focus. Extensive experiments on challenging robotic manipulation tasks demonstrate that ACDC consistently outperforms the state-of-the-art baselines in both sample efficiency and final task success rate.