SynAgent: Generalizable Cooperative Humanoid Manipulation via Solo-to-Cooperative Agent Synergy

arXiv cs.RO / 4/22/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces SynAgent, a unified approach for scalable, physically plausible cooperative humanoid manipulation that transfers skills from single-agent human–object interaction to multi-agent human–object–human scenarios.
  • It proposes an interaction-preserving retargeting technique using an Interact Mesh built via Delaunay tetrahedralization to maintain semantic/spatial relationships during motion transfer.
  • SynAgent uses a single-agent pretraining and adaptation pipeline that bootstraps cooperative behaviors from abundant single-human data, employing decentralized training and multi-agent PPO.
  • For stable, controllable execution, it develops a trajectory-conditioned generative policy based on a conditional VAE, trained with multi-teacher distillation from motion imitation priors.
  • Experiments reportedly show SynAgent outperforms existing baselines in cooperative imitation and trajectory-conditioned control, with improved generalization across varied object geometries.

Abstract

Controllable cooperative humanoid manipulation is a fundamental yet challenging problem for embodied intelligence, due to severe data scarcity, complexities in multi-agent coordination, and limited generalization across objects. In this paper, we present SynAgent, a unified framework that enables scalable and physically plausible cooperative manipulation by leveraging Solo-to-Cooperative Agent Synergy to transfer skills from single-agent human-object interaction to multi-agent human-object-human scenarios. To maintain semantic integrity during motion transfer, we introduce an interaction-preserving retargeting method based on an Interact Mesh constructed via Delaunay tetrahedralization, which faithfully maintains spatial relationships among humans and objects. Building upon this refined data, we propose a single-agent pretraining and adaptation paradigm that bootstraps synergistic collaborative behaviors from abundant single-human data through decentralized training and multi-agent PPO. Finally, we develop a trajectory-conditioned generative policy using a conditional VAE, trained via multi-teacher distillation from motion imitation priors to achieve stable and controllable object-level trajectory execution. Extensive experiments demonstrate that SynAgent significantly outperforms existing baselines in both cooperative imitation and trajectory-conditioned control, while generalizing across diverse object geometries. Codes and data will be available after publication. Project Page: http://yw0208.github.io/synagent