Forager: a lightweight testbed for continual learning with partial observability in RL

arXiv cs.LG / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Forager is introduced as a lightweight continual reinforcement learning (CRL) testbed designed for large partially observable environments without requiring expensive simulation setups.
The work argues that prior CRL research often emphasized mitigating loss of plasticity in otherwise fully observable settings, while under-studying the effects of partial observability and agents that use memory or recurrence.
Forager is built to have a constant memory footprint, making it practical for repeated experiments while still being challenging for existing CRL agents.
Experiments show that agents still suffer from loss of plasticity, and that proposed mitigation methods help to some extent, but that state construction (building informative internal representations) is the most useful lever.
The paper also presents a Forager variant that can generate an endless stream of new tasks, making it easier to clearly expose the limitations of current CRL approaches.

Abstract

In continual reinforcement learning (CRL), good performance requires never-ending learning, acting, and exploration in a big, partially observable world. Most CRL experiments have focused on loss of plasticity -- the inability to keep learning -- in one-off experiments where some unobservable non-stationarity is added to classic fully observable MDPs. Further, these experiments rarely consider the role of partial observability and the importance of CRL agents that use memory or recurrence. One potential reason for this focus on mitigating loss of plasticity without considering partial observability is that many partially-observable CRL environments are prohibitively expensive. In this paper, we introduce Forager, a light-weight partially-observable CRL environment with a constant memory footprint. We provide a set of experiments and sample tasks demonstrating that Forager is challenging for current CRL agents and yet also allows for in-depth study of those agents. We demonstrate that agents exhibit loss of plasticity, proposed mitigations can help, but that most useful is to leverage state construction. We conclude with a variant of Forager that generates an unending stream of new tasks to learn that clearly highlights the limitations of current CRL agents.