GraSP-STL: A Graph-Based Framework for Zero-Shot Signal Temporal Logic Planning via Offline Goal-Conditioned Reinforcement Learning

arXiv cs.RO / 4/1/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces GraSP-STL, a graph-search-based framework for offline, zero-shot planning under Signal Temporal Logic (STL) specifications.
  • It assumes only an offline dataset of state-action-state transitions from a task-agnostic behavior policy, with no dynamics model, no additional environment interaction, and no task-specific retraining.
  • GraSP-STL learns a goal-conditioned value function from offline data to derive a finite-horizon reachability metric, then builds a directed state-graph abstraction whose edges represent feasible short-horizon transitions.
  • Planning is performed as a graph search over waypoint sequences, evaluated using arithmetic-geometric mean robustness with interval semantics, and then executed via the learned goal-conditioned policy.
  • The framework is designed to decouple reusable reachability learning from task-conditioned planning, enabling generalization to unseen STL tasks and longer-horizon behavior composition using short-horizon offline segments.

Abstract

This paper studies offline, zero-shot planning under Signal Temporal Logic (STL) specifications. We assume access only to an offline dataset of state-action-state transitions collected by a task-agnostic behavior policy, with no analytical dynamics model, no further environment interaction, and no task-specific retraining. The objective is to synthesize a control strategy whose resulting trajectory satisfies an arbitrary unseen STL specification. To this end, we propose GraSP-STL, a graph-search-based framework for zero-shot STL planning from offline trajectories. The method learns a goal-conditioned value function from offline data and uses it to induce a finite-horizon reachability metric over the state space. Based on this metric, it constructs a directed graph abstraction whose nodes represent representative states and whose edges encode feasible short-horizon transitions. Planning is then formulated as a graph search over waypoint sequences, evaluated using arithmetic-geometric mean robustness and its interval semantics, and executed by a learned goal-conditioned policy. The proposed framework separates reusable reachability learning from task-conditioned planning, enabling zero-shot generalization to unseen STL tasks and long-horizon planning through the composition of short-horizon behaviors from offline data. Experimental results demonstrate its effectiveness on a range of offline STL planning tasks.