CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation
arXiv cs.RO / 3/25/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces CaP-X, an open-access framework to benchmark and improve “Code-as-Policy” coding agents for embodied robot manipulation.
- Its core component, CaP-Gym, provides an interactive environment where agents control robots by synthesizing and executing programs that combine perception and control primitives.
- Using CaP-Bench, the authors evaluate 12 frontier language/vision-language models and find performance rises with human-crafted abstractions but drops when those priors are removed, highlighting dependence on designer scaffolding.
- The study shows robustness can be improved via scaling test-time computation (e.g., multi-turn interaction, structured execution feedback, visual differencing, skill synthesis, and ensembling) and proposes CaP-Agent0 as a training-free method achieving human-level reliability in multiple tasks.
- It also proposes CaP-RL, demonstrating that reinforcement learning with verifiable rewards improves success rates and enables better sim-to-real transfer with a small gap.
Related Articles
The Security Gap in MCP Tool Servers (And What I Built to Fix It)
Dev.to
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
I made a new programming language to get better coding with less tokens.
Dev.to
RSA Conference 2026: The Week Vibe Coding Security Became Impossible to Ignore
Dev.to

Adversarial AI framework reveals mechanisms behind impaired consciousness and a potential therapy
Reddit r/artificial