Bimanual Robot Manipulation via Multi-Agent In-Context Learning

arXiv cs.RO / 4/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes BiCICLe, a framework that enables standard text-only LLMs to perform few-shot bimanual robot manipulation without fine-tuning by leveraging in-context learning.
It addresses bimanual coordination challenges by modeling control as a multi-agent leader-follower setup that sequentially predicts conditioned single-arm actions, reducing pressure on the LLM context window.
The approach extends an iterative “Arms' Debate” refinement loop to improve trajectory plausibility and adds an LLM-as-Judge mechanism to select the best coordinated trajectories.
Experiments on 13 tasks from the TWIN benchmark show BiCICLe reaches up to 71.1% average success rate, improving over the best training-free baseline by 6.7 percentage points and often outperforming supervised methods.
The method also demonstrates strong few-shot generalization to novel tasks beyond those used in evaluation.

Abstract

Language Models (LLMs) have emerged as powerful reasoning engines for embodied control. In particular, In-Context Learning (ICL) enables off-the-shelf, text-only LLMs to predict robot actions without any task-specific training while preserving their generalization capabilities. Applying ICL to bimanual manipulation remains challenging, as the high-dimensional joint action space and tight inter-arm coordination constraints rapidly overwhelm standard context windows. To address this, we introduce BiCICLe (Bimanual Coordinated In-Context Learning), the first framework that enables standard LLMs to perform few-shot bimanual manipulation without fine-tuning. BiCICLe frames bimanual control as a multi-agent leader-follower problem, decoupling the action space into sequential, conditioned single-arm predictions. This naturally extends to Arms' Debate, an iterative refinement process, and to the introduction of a third LLM-as-Judge to evaluate and select the most plausible coordinated trajectories. Evaluated on 13 tasks from the TWIN benchmark, BiCICLe achieves up to 71.1% average success rate, outperforming the best training-free baseline by 6.7 percentage points and surpassing most supervised methods. We further demonstrate strong few-shot generalization on novel tasks.