EnergyAction: Unimanual to Bimanual Composition with Energy-Based Models

arXiv cs.RO / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • EnergyAction is a framework for compositional transfer from unimanual manipulation policies to bimanual robot tasks using energy-based models (EBMs) despite limited bimanual demonstration data.
  • The approach represents left and right unimanual policies as EBMs and composes them into a unified bimanual policy, aiming to better leverage existing unimanual knowledge.
  • It adds an energy-constrained temporal-spatial coordination mechanism to ensure bimanual action sequences are temporally coherent and spatially feasible.
  • The method proposes two energy-aware denoising strategies that adapt denoising steps based on action quality, improving action quality while maintaining computational efficiency.
  • Experiments on simulated and real-world bimanual tasks show EnergyAction outperforms prior approaches while requiring minimal bimanual data.

Abstract

Recent advances in unimanual manipulation policies have achieved remarkable success across diverse robotic tasks through abundant training data and well-established model architectures. However, extending these capabilities to bimanual manipulation remains challenging due to the lack of bimanual demonstration data and the complexity of coordinating dual-arm actions. Existing approaches either rely on extensive bimanual datasets or fail to effectively leverage pre-trained unimanual policies. To address this limitation, we propose \textbf{EnergyAction}, a novel framework that compositionally transfers unimanual manipulation policies to bimanual tasks through the Energy-Based Models (EBMs). Specifically, our method incorporates three key innovations. First, we model individual unimanual policies as EBMs and leverage their compositional properties to compose left and right arm actions, enabling the fusion of unimanual policies into a bimanual policy. Second, we introduce an energy-based temporal-spatial coordination mechanism through energy constraints, ensuring the generated bimanual actions are both temporal coherence and spatial feasibility. Third, we propose two different energy-aware denoising strategies that dynamically adapt denoising steps based on action quality assessment. These strategies ensure the generation of high-quality actions while maintaining superior computational efficiency compared to fixed-step denoising approaches. Experimental results demonstrate that EnergyAction effectively transfers unimanual knowledge to bimanual tasks, achieving superior performance on both simulated and real-world tasks with minimal bimanual data.

EnergyAction: Unimanual to Bimanual Composition with Energy-Based Models | AI Navigate