Rethinking Plasticity in Deep Reinforcement Learning

arXiv cs.LG / 2026/3/24

💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research

要点

  • The paper analyzes why plasticity loss occurs in deep reinforcement learning when neural networks fail to adapt to non-stationary environments over time.
  • It critiques prior descriptive metrics (e.g., dormant neurons, effective rank) for not explaining the true optimization dynamics behind learning breakdown.
  • The authors propose the Optimization-Centric Plasticity (OCP) hypothesis: optimal solutions for earlier tasks become poor local optima for new tasks, trapping parameters during transitions and preventing further learning.
  • They theoretically show an equivalence between neuron dormancy and zero-gradient states, arguing that lack of gradient signals is the core cause of dormancy.
  • Experiments indicate plasticity loss is highly task-specific, and parameter constraints can reduce entrenchment in harmful local optima, helping restore plasticity across varied non-stationary RL scenarios.

Abstract

This paper investigates the fundamental mechanisms driving plasticity loss in deep reinforcement learning (RL), a critical challenge where neural networks lose their ability to adapt to non-stationary environments. While existing research often relies on descriptive metrics like dormant neurons or effective rank, these summaries fail to explain the underlying optimization dynamics. We propose the Optimization-Centric Plasticity (OCP) hypothesis, which posits that plasticity loss arises because optimal points from previous tasks become poor local optima for new tasks, trapping parameters during task transitions and hindering subsequent learning. We theoretically establish the equivalence between neuron dormancy and zero-gradient states, demonstrating that the absence of gradient signals is the primary driver of dormancy. Our experiments reveal that plasticity loss is highly task-specific; notably, networks with high dormancy rates in one task can achieve performance parity with randomly initialized networks when switched to a significantly different task, suggesting that the network's capacity remains intact but is inhibited by the specific optimization landscape. Furthermore, our hypothesis elucidates why parameter constraints mitigate plasticity loss by preventing deep entrenchment in local optima. Validated across diverse non-stationary scenarios, our findings provide a rigorous optimization-based framework for understanding and restoring network plasticity in complex RL domains.