Deep Reinforcement Learning for Robotic Manipulation under Distribution Shift with Bounded Extremum Seeking
arXiv cs.RO / 4/2/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses a common limitation of reinforcement learning in robotics: performance drops when deployment conditions differ from the training distribution, especially in contact-rich manipulation tasks like pushing and pick-and-place.
- It proposes a hybrid control approach that combines a deep deterministic policy gradient (DDPG) learned policy with a bounded extremum seeking (ES) component during deployment.
- The RL policy is used to generate fast manipulation behavior, while the bounded ES is designed to maintain robustness to time variations and other shifts when the system moves out-of-distribution.
- Experiments evaluate the controller under multiple out-of-distribution scenarios, including time-varying goals and spatially varying friction patches.
- The overall contribution is a method to improve robustness of learned robotic manipulation policies under distribution shift without retraining for each deployment variation.
Related Articles

Black Hat Asia
AI Business

Unitree's IPO
ChinaTalk

Did you know your GIGABYTE laptop has a built-in AI coding assistant? Meet GiMATE Coder 🤖
Dev.to

Benchmarking Batch Deep Reinforcement Learning Algorithms
Dev.to
A bug in Bun may have been the root cause of the Claude Code source code leak.
Reddit r/LocalLLaMA