Rewind-IL: Online Failure Detection and State Respawning for Imitation Learning

arXiv cs.RO / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

Rewind-IL is presented as a training-free online safeguard for imitation learning systems that use generative, action-chunked policies, targeting reliability issues when execution drifts away from demonstration behavior.
It uses a zero-shot failure detector based on TIDE (Temporal Inter-chunk Discrepancy Estimate) and calibrates decisions with split conformal prediction to reduce false triggers under benign changes.
When a failure is detected, Rewind-IL employs a “state respawning” mechanism that rewinds the robot to a semantically verified safe intermediate state and then restarts inference from a clean policy state.
The approach builds an offline recovery-checkpoint library using a vision-language model over demonstrations, then matches online execution to checkpoint features via a compact database constructed from a frozen policy encoder.
Experiments on long-horizon real and simulated manipulation tasks (including transfer to flow-matching action-chunked policies) indicate improved robustness by combining internal policy consistency checks with semantics-grounded recovery.

Abstract

Imitation learning has enabled robots to acquire complex visuomotor manipulation skills from demonstrations, but deployment failures remain a major obstacle, especially for long-horizon action-chunked policies. Once execution drifts off the demonstration manifold, these policies often continue producing locally plausible actions without recovering from the failure. Existing runtime monitors either require failure data, over-trigger under benign feature drift, or stop at failure detection without providing a recovery mechanism. We present Rewind-IL, a training-free online safeguard framework for generative action-chunked imitation policies. Rewind-IL combines a zero-shot failure detector based on Temporal Inter-chunk Discrepancy Estimate (TIDE), calibrated with split conformal prediction, with a state-respawning mechanism that returns the robot to a semantically verified safe intermediate state. Offline, a vision-language model identifies recovery checkpoints in demonstrations, and the frozen policy encoder is used to construct a compact checkpoint feature database. Online, Rewind-IL monitors self-consistency in overlapping action chunks, tracks similarity to the checkpoint library, and, upon failure, rewinds execution to the latest verified safe state before restarting inference from a clean policy state. Experiments on real-world and simulated long-horizon manipulation tasks, including transfer to flow-matching action-chunked policies, demonstrate that policy-internal consistency coupled with semantically grounded respawning offers a practical route to improved reliability in imitation learning. Supplemental materials are available at https://sjay05.github.io/rewind-il