Context-Sensitive Abstractions for Reinforcement Learning with Parameterized Actions

arXiv cs.AI / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses reinforcement learning (RL) for long-horizon, sparse-reward problems with parameterized action spaces that combine discrete choices and continuous parameters.
  • It argues that existing planning methods and standard RL algorithms are poorly suited to this mixed action setting, and that prior parameterized-action RL approaches often require domain-specific engineering and underuse the structure of these spaces.
  • The authors propose RL algorithms that learn state and action abstractions online, progressively refining them to add more detail only in the important regions of the state-action space.
  • Experiments on multiple continuous-state, parameterized-action domains show that the abstraction-driven method improves sample efficiency, with TD(λ) achieving notably higher results than strong baselines.
  • Overall, the work extends RL to better exploit latent structure in parameterized-action environments without heavy manual modeling.

Abstract

Real-world sequential decision-making often involves parameterized action spaces that require both, decisions regarding discrete actions and decisions about continuous action parameters governing how an action is executed. Existing approaches exhibit severe limitations in this setting -- planning methods demand hand-crafted action models, and standard reinforcement learning (RL) algorithms are designed for either discrete or continuous actions but not both, and the few RL methods that handle parameterized actions typically rely on domain-specific engineering and fail to exploit the latent structure of these spaces. This paper extends the scope of RL algorithms to long-horizon, sparse-reward settings with parameterized actions by enabling agents to autonomously learn both state and action abstractions online. We introduce algorithms that progressively refine these abstractions during learning, increasing fine-grained detail in the critical regions of the state-action space where greater resolution improves performance. Across several continuous-state, parameterized-action domains, our abstraction-driven approach enables TD(\lambda) to achieve markedly higher sample efficiency than state-of-the-art baselines.