On the Hardness of Reinforcement Learning with Transition Look-Ahead

arXiv stat.ML / 3/31/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies reinforcement learning where the agent can look ahead by observing which next states would be reached after executing any length-ℓ action sequence before choosing an action.
  • It shows that while transition look-ahead can greatly improve achievable RL performance, computing the optimal use of this information can be prohibitively expensive.
  • For one-step look-ahead (ℓ=1), the authors give a polynomial-time solution via a new linear programming formulation.
  • For multi-step look-ahead (ℓ≥2), the optimal planning problem is proven to be NP-hard, establishing an explicit tractability boundary.

Abstract

We study reinforcement learning (RL) with transition look-ahead, where the agent may observe which states would be visited upon playing any sequence of \ell actions before deciding its course of action. While such predictive information can drastically improve the achievable performance, we show that using this information optimally comes at a potentially prohibitive computational cost. Specifically, we prove that optimal planning with one-step look-ahead (\ell=1) can be solved in polynomial time through a novel linear programming formulation. In contrast, for \ell \geq 2, the problem becomes NP-hard. Our results delineate a precise boundary between tractable and intractable cases for the problem of planning with transition look-ahead in reinforcement learning.