Robust Quadruped Locomotion via Evolutionary Reinforcement Learning

arXiv cs.RO / 4/9/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper investigates why deep reinforcement learning (DDPG/TD3) policies for quadruped walking in simulation often fail when the physical environment differs from training conditions.
  • It evaluates four approaches—standard deep RL (DDPG, TD3) and two evolutionary reinforcement learning variants (CEM-DDPG, CEM-TD3)—trained on flat terrain and tested on both flat and unseen rough terrain.
  • TD3 is reported as the strongest among standard deep RL baselines on flat ground, while CEM-TD3 attains the highest overall training and evaluation rewards.
  • On rough-terrain transfer, standard deep RL methods experience a sharp performance drop, whereas the evolutionary variants retain substantially more locomotion capability.
  • The results suggest evolutionary search components can mitigate overfitting and improve robustness for deployment in changing or unobserved terrains.

Abstract

Deep reinforcement learning has recently achieved strong results in quadrupedal locomotion, yet policies trained in simulation often fail to transfer when the environment changes. Evolutionary reinforcement learning aims to address this limitation by combining gradient-based policy optimisation with population-driven exploration. This work evaluates four methods on a simulated walking task: DDPG, TD3, and two Cross-Entropy-based variants CEM-DDPG and CEM-TD3. All agents are trained on flat terrain and later tested both on this domain and on a rough terrain not encountered during training. TD3 performs best among the standard deep RL baselines on flat ground with a mean reward of 5927.26, while CEM-TD3 achieves the highest rewards overall during training and evaluation 17611.41. Under the rough-terrain transfer test, performance of the deep RL methods drops sharply. DDPG achieves -1016.32 and TD3 achieves -99.73, whereas the evolutionary variants retain much of their capability. CEM-TD3 records the strongest transfer performance with a mean reward of 19574.33. These findings suggest that incorporating evolutionary search can reduce overfitting and improve policy robustness in locomotion tasks, particularly when deployment conditions differ from those seen during training.