ExpressMM: Expressive Mobile Manipulation Behaviors in Human-Robot Interactions

arXiv cs.RO / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces ExpressMM, a framework for generating expressive behaviors in mobile manipulators during human-robot collaborative tasks, aiming to communicate intent to nearby people.
  • ExpressMM combines a high-level language-guided planner using a vision-language model for perception and conversational reasoning with a low-level vision-language-action policy to produce task-appropriate expressive motions.
  • A key contribution is interruptible interaction support, enabling users to modify or redirect robot actions mid-execution rather than relying on fixed or demonstration-only behaviors.
  • The authors validate the approach on a mobile manipulator for collaborative assembly, including live audience-based HRI demonstrations and questionnaire-based evaluation of perceived interpretability, safety, and predictability.

Abstract

Mobile manipulators are increasingly deployed in human-centered environments to perform tasks. While completing such tasks, they should also be able to communicate their intent to the people around them using expressive robot behaviors. Prior work on expressive robot behaviors has used preprogrammed or learning-from-demonstration- based expressive motions and large language model generated high-level interactions. The majority of these existing approaches have not considered human-robot interactions (HRI) where users may interrupt, modify, or redirect a robot's actions during task execution. In this paper, we develop the novel ExpressMM framework that integrates a high-level language-guided planner based on a vision-language model for perception and conversational reasoning with a low-level vision-language-action policy to generate expressive robot behaviors during collaborative HRI tasks. Furthermore, ExpressMM supports interruptible interactions to accommodate updated or redirecting instructions by users. We demonstrate ExpressMM on a mobile manipulator assisting a human in a collaborative assembly scenario and conduct audience-based evaluation of live HRI demonstrations. Questionnaire results show that the ExpressMM-enabled expressive behaviors helped observers clearly interpret the robot's actions and intentions while supporting socially appropriate and understandable interactions. Participants also reported that the robot was useful for collaborative tasks and behaved in a predictable and safe manner during the demonstrations, fostering positive perceptions of the robot's usefulness, safety, and predictability during the collaborative tasks.