Joint Prediction of Human Motions and Actions in Human-Robot Collaboration

arXiv cs.RO / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes MA-HERP, a hierarchical and recursive probabilistic framework to jointly estimate and predict humans’ continuous motions and discrete actions during human–robot collaboration.
  • It models how continuous movements compose into actions using hierarchical structure with admissible Allen interval relations, while coupling continuous dynamics with discrete labels and durations in a unified probabilistic factorization.
  • A recursive inference procedure alternates top-down action prediction with bottom-up sensory evidence in a Bayesian-filtering-like scheme to improve robustness under noise.
  • Preliminary experiments using neural models trained on musculoskeletal simulations of reaching show accurate motion prediction, reliable action inference under noise, and computational performance suitable for online collaboration.

Abstract

Fluent human--robot collaboration requires robots to continuously estimate human behaviour and anticipate future intentions. This entails reasoning jointly about \emph{continuous movements} and \emph{discrete actions}, which are still largely modelled in isolation. In this paper, we introduce \textsf{MA-HERP}, a hierarchical and recursive probabilistic framework for the \emph{joint estimation and prediction} of human movements and actions. The model combines: (i) a hierarchical representation in which movements compose into actions through admissible Allen interval relations, (ii) a unified probabilistic factorisation coupling continuous dynamics, discrete labels, and durations, and (iii) a recursive inference scheme inspired by Bayesian filtering, alternating top-down action prediction with bottom-up sensory evidence. We present a preliminary experimental evaluation based on neural models trained on musculoskeletal simulations of reaching movements, showing accurate motion prediction, robust action inference under noise, and computational performance compatible with on-line human--robot collaboration.