Scaling Cross-Environment Failure Reasoning Data for Vision-Language Robotic Manipulation
arXiv cs.RO / 4/1/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes an automatic framework to scale diverse robotic manipulation failure cases across simulation and real-world settings by perturbing successful trajectories to match realistic failure distributions.
- It introduces FailCoT, a large-scale failure reasoning dataset generated using a vision-language model to create structured step-by-step reasoning traces, built from RLBench and BridgeDataV2.
- Using FailCoT, the authors train Guardian, a multi-view reasoning VLM designed to unify planning and execution verification for robust failure detection and recovery.
- Guardian achieves state-of-the-art results on three unseen real-world benchmarks (RoboFail, RoboVQA, and a newly introduced UR5-Fail).
- When combined with an LLM-based manipulation policy, Guardian reliably improves task success rates in both simulation and real-world deployments, highlighting the importance of high-quality failure reasoning data for generalization.
Related Articles

Knowledge Governance For The Agentic Economy.
Dev.to

AI server farms heat up the neighborhood for miles around, paper finds
The Register
Does the Claude “leak” actually change anything in practice?
Reddit r/LocalLLaMA

87.4% of My Agent's Decisions Run on a 0.8B Model
Dev.to

AIエージェントをソフトウェアチームに変える無料ツール「Paperclip」
Dev.to