CRAFT: Video Diffusion for Bimanual Robot Data Generation
arXiv cs.RO / 4/7/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- CRAFT introduces a diffusion-based framework that generates scalable, temporally coherent bimanual robot demonstration videos with associated action labels for training.
- The method conditions video diffusion on Canny/edge-based structural cues derived from simulator trajectories, enabling physically plausible trajectory variations and a unified augmentation pipeline.
- It supports diverse synthetic variations including object pose changes, camera viewpoint/lighting/background shifts, cross-embodiment transfer, and multi-view synthesis.
- By starting from only a few real-world demonstrations and avoiding real-robot replay, CRAFT aims to address costly and low-diversity real-world data limitations and improve Sim2Real training.
- Experiments on both simulated and real-world bimanual tasks show higher success rates than existing augmentation and simple data-scaling baselines, indicating better generalization for dual-arm manipulation.
Related Articles

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS
Dev.to
Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.
Reddit r/LocalLLaMA

How AI Humanizers Improve Sentence Structure and Style
Dev.to

Two Kinds of Agent Trust (and Why You Need Both)
Dev.to

Agent Diary: Apr 10, 2026 - The Day I Became a Workflow Ouroboros (While Run 236 Writes About Writing About Writing)
Dev.to