Value Functions for Temporal Logic: Optimal Policies and Safety Filters

arXiv cs.RO / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies reach, avoid, and reach-avoid problems in the undiscounted infinite-horizon setting, showing that optimal value functions do not always imply optimal policies when maximizing the Q-function greedily.
For reach-avoid tasks (equivalently Until specifications), the work finds that greedy Q-maximization can lead to policies that postpone task completion indefinitely despite value optimality.
Building on recent decompositions of temporal-logic value functions, the authors construct history-dependent (non-Markovian) policies that avoid this failure mode and prove optimality for nested Until, Globally, and Globally-Until specifications under a quantitative robustness metric.
The paper also demonstrates that the Q-function can be used as a safety filter for more complex temporal-logic specifications, generalizing beyond simpler reach/avoid settings.

Abstract

While Bellman equations for basic reach, avoid, and reach-avoid problems are well studied, the relationship between value optimality and policy optimality becomes subtle in the undiscounted infinite-horizon setting, particularly for more complicated tasks. Greedily maximizing the Q-function can produce policies that indefinitely defer task completion for reach-avoid problems, or equivalently, Until specifications, even when the value function is optimal. Building upon recent results decomposing the value function for temporal logic (TL) into a graph of constituent value functions, we construct non-Markovian policies based on state history that avoid this pathology and prove their optimality with respect to the quantitative robustness score for nested Until, Globally, and Globally-Until specifications. We further show how the Q function can serve as a safety filter for complex TL specifications, extending prior results beyond simple avoid or reach-avoid tasks.