Stability-Driven Motion Generation for Object-Guided Human-Human Co-Manipulation

arXiv cs.CV / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper presents a flow-matching generative framework for generating human-human co-manipulation motions that keep interactions natural and preserve stable states while handling object-induced dynamics.
  • It introduces an explicit manipulation strategy generator derived from the object’s affordances and spatial configuration to guide the motion toward successful shared manipulation.
  • To improve realism, the method adds an adversarial interaction prior that encourages natural individual poses and more realistic inter-person interactions.
  • It further integrates stability-driven simulation into the flow-matching process, using sampling-based optimization to refine unstable states and adjusting the vector-field regression to boost manipulation effectiveness.
  • Experiments show improved contact accuracy, reduced penetration, and higher distributional fidelity versus state-of-the-art human-object interaction baselines, with code released on GitHub.

Abstract

Co-manipulation requires multiple humans to synchronize their motions with a shared object while ensuring reasonable interactions, maintaining natural poses, and preserving stable states. However, most existing motion generation approaches are designed for single-character scenarios or fail to account for payload-induced dynamics. In this work, we propose a flow-matching framework that ensures the generated co-manipulation motions align with the intended goals while maintaining naturalness and effectiveness. Specifically, we first introduce a generative model that derives explicit manipulation strategies from the object's affordance and spatial configuration, which guide the motion flow toward successful manipulation. To improve motion quality, we then design an adversarial interaction prior that promotes natural individual poses and realistic inter-person interactions during co-manipulation. In addition, we also incorporate a stability-driven simulation into the flow matching process, which refines unstable interaction states through sampling-based optimization and directly adjusts the vector field regression to promote more effective manipulation. The experimental results demonstrate that our method achieves higher contact accuracy, lower penetration, and better distributional fidelity compared to state-of-the-art human-object interaction baselines. The code is available at https://github.com/boycehbz/StaCOM.