Joint Prediction of Human Motions and Actions in Human-Robot Collaboration

arXiv cs.RO / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes MA-HERP, a hierarchical and recursive probabilistic framework to jointly estimate and predict humans’ continuous motions and discrete actions during human–robot collaboration.
It models how continuous movements compose into actions using hierarchical structure with admissible Allen interval relations, while coupling continuous dynamics with discrete labels and durations in a unified probabilistic factorization.
A recursive inference procedure alternates top-down action prediction with bottom-up sensory evidence in a Bayesian-filtering-like scheme to improve robustness under noise.
Preliminary experiments using neural models trained on musculoskeletal simulations of reaching show accurate motion prediction, reliable action inference under noise, and computational performance suitable for online collaboration.

Abstract

Fluent human--robot collaboration requires robots to continuously estimate human behaviour and anticipate future intentions. This entails reasoning jointly about \emph{continuous movements} and \emph{discrete actions}, which are still largely modelled in isolation. In this paper, we introduce \textsf{MA-HERP}, a hierarchical and recursive probabilistic framework for the \emph{joint estimation and prediction} of human movements and actions. The model combines: (i) a hierarchical representation in which movements compose into actions through admissible Allen interval relations, (ii) a unified probabilistic factorisation coupling continuous dynamics, discrete labels, and durations, and (iii) a recursive inference scheme inspired by Bayesian filtering, alternating top-down action prediction with bottom-up sensory evidence. We present a preliminary experimental evaluation based on neural models trained on musculoskeletal simulations of reaching movements, showing accurate motion prediction, robust action inference under noise, and computational performance compatible with on-line human--robot collaboration.