To Do or Not to Do: Ensuring the Safety of Visuomotor Policies Learned from Demonstrations

arXiv cs.RO / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that imitation learning (IL) policies are often evaluated only by task success, which is insufficient for field robotics where safety assurance is critical.
It introduces “execution guarantee,” a policy-agnostic safety metric that aims to maximize task success for visuomotor IL policies under minor runtime changes within a specified region of the state space.
The method uses view synthesis to identify which regions of the state space are suitable for the guarantee, connecting the approach to set-invariance theory.
By applying Nagumo’s sub-tangentiality condition, the authors formalize and operationalize execution guarantee, enabling safer deployment of IL policies.
Experiments on a Franka robot in both simulation and the real world show guaranteed maximum task success, and they further use the resulting recovery policy to improve performance and reduce the safety–performance tradeoff.

Abstract

Task success has historically been the primary measure of policy performance in imitation learning (IL) research. This characteristics strictly limits the ubiquitous applications of IL algorithms in field robotics where safety assurance, in addition to task-success, is of paramount importance. It is often desirable for an IL-powered robot in the field not to roll out a policy, and hence score a poor performance, if the safety is not guaranteed. Although this trade-off between safety and performance is well investigated in classical control literature, policy safety is a heavily underexplored domain in IL research. There is no universal definition of safety in IL. To make things worst, many existing theoretical works on safety is notoriously difficult to extend to IL-powered robots in the field. This paper offers important insights on the safety and performance of IL policies. We propose execution guarantee, a policy-agnostic safety measure that guarantees the maximum task success for a visuomotor IL policy, despite minor run-time changes, from within a specific region in the state space. We leverage recent advances in view synthesis to identify such regions in the state space for an IL policy and explore a fundamental result on set invariance - namely, Nagumo's sub-tangentiality condition - to prove and operationalize execution guarantee from inside that region. Experiments with a Franka robot, both in simulation and real world, demonstrate how the proposed safety analysis allows various IL policies to achieve maximum task success with guarantee. We also demonstrate some interesting results on how a recovery policy - a by-product of the proposed safety analysis - can help to increase the policy performance and thereby mitigating the safety-performance tradeoff in IL.