ACDC: Adaptive Curriculum Planning with Dynamic Contrastive Control for Goal-Conditioned Reinforcement Learning in Robotic Manipulation

arXiv cs.RO / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces ACDC (Adaptive Curriculum Planning with Dynamic Contrastive Control) for goal-conditioned reinforcement learning in robotic manipulation, aiming to improve over experience-prioritization-based methods.
ACDC combines an Adaptive Curriculum (AC) planner that dynamically balances diversity-driven exploration and quality-driven exploitation using metrics like success rate and training progress.
The Dynamic Contrastive (DC) control component executes the planned curriculum via norm-constrained contrastive learning, using magnitude-guided experience selection to match the current learning focus.
Experiments on challenging robotic manipulation tasks report that ACDC outperforms state-of-the-art baselines in both sample efficiency and final task success rate.

Abstract

Goal-conditioned reinforcement learning has shown considerable potential in robotic manipulation; however, existing approaches remain limited by their reliance on prioritizing collected experience, resulting in suboptimal performance across diverse tasks. Inspired by human learning behaviors, we propose a more comprehensive learning paradigm, ACDC, which integrates multidimensional Adaptive Curriculum (AC) Planning with Dynamic Contrastive (DC) Control to guide the agent along a well-designed learning trajectory. More specifically, at the planning level, the AC component schedules the learning curriculum by dynamically balancing diversity-driven exploration and quality-driven exploitation based on the agent's success rate and training progress. At the control level, the DC component implements the curriculum plan through norm-constrained contrastive learning, enabling magnitude-guided experience selection aligned with the current curriculum focus. Extensive experiments on challenging robotic manipulation tasks demonstrate that ACDC consistently outperforms the state-of-the-art baselines in both sample efficiency and final task success rate.

RAG in Practice — Part 4: Chunking, Retrieval, and the Decisions That Break RAG

Dev.to

Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R]

Reddit r/MachineLearning

How AI Interview Assistants Are Changing Job Preparation in 2026

Dev.to

Consciousness in Artificial Intelligence: Insights from the Science ofConsciousness

Dev.to

NEW PROMPT INJECTION

Dev.to

ACDC: Adaptive Curriculum Planning with Dynamic Contrastive Control for Goal-Conditioned Reinforcement Learning in Robotic Manipulation

Key Points

Abstract

Related Articles

RAG in Practice — Part 4: Chunking, Retrieval, and the Decisions That Break RAG

Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R]

How AI Interview Assistants Are Changing Job Preparation in 2026

Consciousness in Artificial Intelligence: Insights from the Science ofConsciousness

NEW PROMPT INJECTION

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer