RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and Correction

arXiv cs.RO / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The article introduces RoboFAC, a framework aimed at improving Vision-Language-Action (VLA) robotic manipulation by adding structured supervision for failure diagnosis and recovery rather than relying only on successful demonstrations.
  • It builds a failure-centric dataset with 9,440 erroneous trajectories and 78,623 QA pairs across 53 scenes in both simulation and real-world settings, with failure types systematically categorized.
  • RoboFAC uses a lightweight multimodal model for task understanding, failure analysis, and failure correction, designed to run locally while remaining competitive with large proprietary models.
  • Experimental results show RoboFAC improves failure analysis accuracy by 34.1% over GPT-4o and, when used as an external supervisor in a real-world VLA pipeline, delivers a 29.1% relative performance gain across four tasks with lower latency than GPT-4o.
  • The authors publicly release both the model and dataset on GitHub, enabling other researchers to adopt the framework for more robust open-world robot recovery.

Abstract

Vision-Language-Action (VLA) models have recently advanced robotic manipulation by translating natural-language instructions and visual observations into control actions. However, existing VLAs are primarily trained on successful expert demonstrations and lack structured supervision for failure diagnosis and recovery, limiting robustness in open-world scenarios. To address this limitation, we propose the Robotic Failure Analysis and Correction (RoboFAC) framework. We construct a large-scale failure-centric dataset comprising 9,440 erroneous manipulation trajectories and 78,623 QA pairs across 53 scenes in both simulation and real-world environments, with systematically categorized failure types. Leveraging this dataset, we develop a lightweight multimodal model specialized for task understanding, failure analysis, and failure correction, enabling efficient local deployment while remaining competitive with large proprietary models. Experimental results demonstrate that RoboFAC achieves a 34.1% higher failure analysis accuracy compared to GPT-4o. Furthermore, we integrated RoboFAC as an external supervisor in a real-world VLA control pipeline, yielding a 29.1% relative improvement across four tasks while significantly reducing latency relative to GPT-4o. These results demonstrate that RoboFAC enables systematic failure diagnosis and recovery, significantly enhancing VLA recovery capabilities. Our model and dataset are publicly available at https://github.com/MINT-SJTU/RoboFAC.
広告