NL2SpaTiaL: Generating Geometric Spatio-Temporal Logic Specifications from Natural Language for Manipulation Tasks

arXiv cs.RO / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes NL2SpaTiaL, a framework that converts natural-language instructions into verifiable Spatio-Temporal Logic specifications tailored for robotic manipulation tasks.
  • It addresses a key limitation of prior NL-to-Logic approaches by generating structured Hierarchical Logical Trees (HLTs), preventing semantic entanglement between nested temporal scopes and spatial relations.
  • The authors introduce a new NL-to-SpaTiaL dataset with explicit hierarchical supervision, built via a logic-first synthesis pipeline.
  • Experiments with open-weight LLMs show that the HLT-based formulation substantially outperforms “flat sequence” generation baselines, especially as logical depth increases.
  • The work aims to enable a scalable generate-and-test verification loop, using language-conditioned specifications to validate candidate robotic trajectories rigorously.

Abstract

While Temporal Logic provides a rigorous verification framework for robotics, it typically operates on trajectory-level signals and does not natively represent the object-centric geometric relations that are central to manipulation. Spatio-Temporal Logic (SpaTiaL) overcomes this by explicitly capturing geometric spatial requirements, making it a natural formalism for manipulation-task verification. Consequently, translating natural language (NL) into verifiable SpaTiaL specifications is a critical objective. Yet, existing NL-to-Logic methods treat specifications as flat sequences, entangling nested temporal scopes with spatial relations and causing performance to degrade sharply under deep nesting. We propose NL2SpaTiaL, a framework modeling specifications as Hierarchical Logical Trees (HLT). By generating formulas as structured HLTs in a single shot, our approach decouples semantic parsing from syntactic rendering, aligning with human compositional spatial reasoning. To support this, we construct, to the best of our knowledge, the first NL-to-SpaTiaL dataset with explicit hierarchical supervision via a logic-first synthesis pipeline. Experiments with open-weight LLMs demonstrate that our HLT formulation significantly outperforms flat-generation baselines across various logical depths. These results show that explicit HLT structure is critical for scalable NL-to-SpaTiaL translation, ultimately enabling a rigorous ``generate-and-test'' paradigm for verifying candidate trajectories in language-conditioned robotics. Project website: https://sites.google.com/view/nl2spatial