Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents

arXiv cs.AI / 4/8/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper presents STEP-HRL, a hierarchical reinforcement learning framework for LLM agents that learns from step-level transitions instead of requiring full, ever-growing interaction histories.
  • STEP-HRL represents global task progress with completed subtasks and uses a local progress module to iteratively and selectively summarize interaction history into compact local progress signals.
  • By creating augmented step-level transitions for both high-level and low-level policies, the approach aims to reduce computation while improving how agents generalize.
  • Experiments on ScienceWorld and ALFWorld show STEP-HRL outperforms baseline methods in performance and generalization while also reducing token usage.
  • The authors release code publicly via GitHub, enabling researchers to reproduce and extend the method.

Abstract

Large language model (LLM) agents have demonstrated strong capabilities in complex interactive decision-making tasks. However, existing LLM agents typically rely on increasingly long interaction histories, resulting in high computational cost and limited scalability. In this paper, we propose STEP-HRL, a hierarchical reinforcement learning (HRL) framework that enables step-level learning by conditioning only on single-step transitions rather than full interaction histories. STEP-HRL structures tasks hierarchically, using completed subtasks to represent global progress of overall task. By introducing a local progress module, it also iteratively and selectively summarizes interaction history within each subtask to produce a compact summary of local progress. Together, these components yield augmented step-level transitions for both high-level and low-level policies. Experimental results on ScienceWorld and ALFWorld benchmarks consistently demonstrate that STEP-HRL substantially outperforms baselines in terms of performance and generalization while reducing token usage. Our code is available at https://github.com/TonyStark042/STEP-HRL.