Constrained Decoding for Safe Robot Navigation Foundation Models

arXiv cs.RO / 4/17/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper proposes SafeDec, a constrained decoding framework for transformer-based robot navigation foundation models that targets missing “behavioral correctness” in data-driven policies.
  • SafeDec enforces safety requirements specified as Signal Temporal Logic (STL) formulas by shaping runtime action generation so that generated actions provably satisfy the STL constraints under assumed dynamics.
  • The approach works without retraining and is policy-agnostic, meaning it can be applied as an inference-time intervention to different underlying robot navigation foundation models.
  • Experiments on CHORES benchmark tasks across hundreds of procedurally generated environments show SafeDec improves both unsafe-action filtering and conditional action generation.
  • The method is designed for autoregressive (next-token/action) generation, integrating formal methods with foundation-model robotics for safer navigation behavior.

Abstract

Recent advances in the development of robotic foundation models have led to promising end-to-end and general-purpose capabilities in robotic systems. Trained on vast datasets of simulated and real-world trajectories, these policies map multimodal observations directly to action sequences for physical execution. Despite promising real-world capabilities, these models are still data-driven and, therefore, lack explicit notions of behavioral correctness. We address this gap by introducing SafeDec, a constrained decoding framework for autoregressive, transformer-based robot navigation foundation models that enforces safety specifications expressed as Signal Temporal Logic (STL) formulas. Our method ensures that generated actions provably satisfy STL specifications under assumed dynamics at runtime without retraining while remaining agnostic of the underlying policy. We evaluate SafeDec on tasks from the CHORES benchmark for state-of-the-art embodied navigation policies across hundreds of procedurally generated environments and show that our decoding-time interventions are useful not only for filtering unsafe actions but also for conditional action generation. Videos are available at constrained-robot-fms.github.io