Pre-Execution Safety Gate & Task Safety Contracts for LLM-Controlled Robot Systems

arXiv cs.RO / 4/8/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that LLM-to-robot-code pipelines often lack validation to block unsafe or defective commands before they are executed on robots.
  • It proposes SafeGate, a neurosymbolic architecture that extracts safety-relevant structured properties from natural-language commands and uses a deterministic decision gate to authorize or reject execution.
  • To handle unsafe state transitions during runtime, it introduces Task Safety Contracts that decompose authorized commands into invariants, guards, and abort conditions.
  • The approach uses Z3 SMT solving to enforce constraint checking derived from the Task Safety Contracts and prevent violation-driven unsafe transitions.
  • Evaluation across 230 benchmark tasks, 30 AI2-THOR simulation scenarios, and real-world robot experiments shows SafeGate reduces acceptance of defective commands while preserving high acceptance of benign tasks.

Abstract

Large Language Models (LLMs) are increasingly used to convert task commands into robot-executable code, however this pipeline lacks validation gates to detect unsafe and defective commands before they are translated into robot code. Furthermore, even commands that appear safe at the outset can produce unsafe state transitions during execution in the absence of continuous constraint monitoring. In this research, we introduce SafeGate, a neurosymbolic safety architecture that prevents unsafe natural language task commands from reaching robot execution. Drawing from ISO 13482 safety standard, SafeGate extracts structured safety-relevant properties from natural language commands and applies a deterministic decision gate to authorize or reject execution. In addition, we introduce Task Safety Contracts, which decomposes commands that pass through the gate into invariants, guards, and abort conditions to prevent unsafe state transitions during execution. We further incorporate Z3 SMT solving to enforce constraint checking derived from the Task Safety Contracts. We evaluate SafeGate against existing LLM-based robot safety frameworks and baseline LLMs across 230 benchmark tasks, 30 AI2-THOR simulation scenarios, and real-world robot experiments. Results show that SafeGate significantly reduces the acceptance of defective commands while maintaining a high acceptance of benign tasks, demonstrating the importance of pre-execution safety gates for LLM-controlled robot systems