Evolution Strategies for Deep RL pretraining
arXiv cs.LG / 4/2/2026
📰 News
Key Points
- The paper compares evolution strategies (ES), a derivative-free optimization method, with deep reinforcement learning (DRL) on tasks of increasing difficulty including Flappy Bird, Breakout, and MuJoCo environments.
- It finds ES do not consistently outperform DRL in training speed, despite being simpler to deploy and potentially less computationally costly.
- When ES is used as a preliminary pretraining step for DRL, it improves only in less complex settings (notably Flappy Bird), while providing minimal or no gains for harder tasks like Breakout and MuJoCo Walker.
- Overall, the study suggests ES may be limited as a general-purpose pretraining accelerator for more demanding deep RL workloads, and their effectiveness depends strongly on task complexity.
- The results raise questions about the suitability of ES for scaling to the most challenging decision-making problems where DRL excels.
- categories: [