Chain Of Interaction Benchmark (COIN): When Reasoning meets Embodied Interaction
arXiv cs.RO / 4/21/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- The paper introduces the Chain Of Interaction Benchmark (COIN) to evaluate generalist embodied agents’ interactive, causally dependent reasoning for long-horizon robotic manipulation tasks under partial observability.
- COIN is built from three components—COIN-50 (50 daily interactive tasks), COIN-Primitive (causally dependent primitives), and COIN-Composition (mid-term composition tasks)—to measure both skill learning and generalization.
- The authors create a low-cost mobile AR teleoperation system and collect a dataset containing 50 demonstrations per primitive task (1,000 demonstrations total).
- They propose evaluation metrics focused on execution stability and generalization robustness, and apply them to approaches including CodeAsPolicy, VLA, and language-conditioned H-VLA.
- Results show current models have major gaps between visual understanding and motor execution, and the paper provides a detailed breakdown of these shortcomings.
Related Articles

A practical guide to getting comfortable with AI coding tools
Dev.to

Competitive Map: 10 AI Agent Platforms vs AgentHansa
Dev.to

Every time a new model comes out, the old one is obsolete of course
Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆
Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)
Dev.to