On Tackling Complex Tasks with Reward Machines and Signal Temporal Logics

arXiv cs.AI / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The article presents a reinforcement learning (RL) control framework that extends Reward Machines (RMs) by incorporating Signal Temporal Logic (STL) formulas to generate reward signals for complex tasks.
  • By using STL, the method aims to represent rewards more efficiently while also steering training toward behaviors that satisfy formally specified requirements.
  • The authors propose an implementation that uses online STL monitoring algorithms to support the framework during learning.
  • The approach is evaluated through three case studies—MiniGrid, Cart-Pole, and a highway environment—each involving non-trivial tasks.

Abstract

We propose a Reinforcement Learning (RL) based control design framework for handling complex tasks. The approach extends the concept of Reward Machines (RM) with Signal Temporal Logic (STL) formulas that can be used for event generation. The use of STL allows not only a more efficient representation of rewards for complex tasks but also guiding the training process to converge towards behaviors satisfying specified requirements. We also propose an implementation of the framework that leverages the STL online monitoring algorithms. We illustrate the framework with three case studies (minigrid, cart-pole and high-way environments) with non-trivial tasks.