Hierarchical Behaviour Spaces

arXiv cs.AI / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes Hierarchical Behaviour Spaces (HBS), a hierarchical reinforcement learning method that represents each option via linear combinations of multiple predefined reward functions rather than a single reward function.
  • By having the controller learn weights for these reward-function mixtures, HBS can represent a more expressive set of policies and behaviours.
  • Experiments on the NetHack Learning Environment show that HBS achieves strong performance, validating the approach in a complex benchmark.
  • The authors find that, contrary to common intuition, the main advantage of hierarchy in HBS is improved exploration efficiency rather than longer-horizon reasoning.

Abstract

Recent work in hierarchical reinforcement learning has shown success in scaling to billions of timesteps when learning over a set of predefined option reward functions. We show that, instead of using a single reward function per option, the reward functions can be effectively used to induce a space of behaviours, by letting the controller specify linear combinations over reward functions, allowing a more expressive set of policies to be represented. We call this method Hierarchical Behaviour Spaces (HBS). We evaluate HBS on the NetHack Learning Environment, demonstrating strong performance. We conduct a series of experiments and determine that, perhaps going against conventional wisdom, the benefits of hierarchy in our method come from increased exploration rather than long term reasoning.