Quantum Hierarchical Reinforcement Learning via Variational Quantum Circuits

arXiv cs.LG / 5/6/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper investigates whether the performance advantages of parametrized quantum computations in non-hierarchical reinforcement learning carry over to hierarchical reinforcement learning settings.
  • It proposes a hybrid hierarchical RL agent using the option-critic architecture, replacing multiple classical modules (feature extractors, option-value functions, termination functions, and intra-option policies) with variational quantum circuits.
  • Experiments on standard benchmarks indicate the quantum-enhanced hybrid agent can outperform classical baselines and reduce the number of trainable parameters by up to 66% when using a quantum feature extractor.
  • The authors find a key architectural bottleneck: quantum-based option-value estimation can significantly degrade performance, motivating careful module-level design.
  • Ablation studies further show that specific choices in the quantum circuit architecture materially affect results, leading to proposed design principles for parameter-efficient hybrid hierarchical agents.

Abstract

Reinforcement learning is one of the most challenging learning paradigms where efficacy and efficiency gains are extremely valuable. Hierarchical reinforcement learning is a variant that leverages temporal abstraction to structure decision-making. While parametrized quantum computations have shown success in non-hierarchical reinforcement learning, whether these advantages adapt to hierarchical decision-making remains a critical open question. In this work, we develop a hybrid hierarchical agent based on the option-critic architecture. This hybrid agent substitutes classical components with variational quantum circuits for feature extractors, option-value functions, termination functions, and intra-option policies. Evaluated on standard benchmarking environments, results show that a hybrid agent utilizing a quantum feature extractor can outperform classical baselines while saving up to 66\% trainable parameters. We also identify an architectural bottleneck that quantum option-value estimation severely degrades performance. Further ablation studies reveal how architectural choices of the quantum circuits affect performance. Our work establishes design principles for parameter-efficient hybrid hierarchical agents.