IAM: Identity-Aware Human Motion and Shape Joint Generation

arXiv cs.CV / 4/29/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that current text-driven human motion generation often assumes identity-neutral (canonical) body representations, which can produce physically inconsistent motions by ignoring how morphology affects dynamics.
  • It proposes an identity-aware framework that models the coupling between body shape and motion behavior, using identity signals derived from multimodal inputs like natural language and visual cues.
  • The work introduces a joint motion-and-shape generation approach that synthesizes both motion sequences and body shape parameters together, so identity information can directly modulate motion dynamics.
  • Experiments on motion-capture datasets and large-scale in-the-wild videos show improved motion realism and better consistency between generated motion and identity cues while preserving high motion quality.
  • The authors share a project page for further details, reflecting an early research announcement associated with arXiv:2604.25164v1.

Abstract

Recent advances in text-driven human motion generation enable models to synthesize realistic motion sequences from natural language descriptions. However, most existing approaches assume identity-neutral motion and generate movements using a canonical body representation, ignoring the strong influence of body morphology on motion dynamics. In practice, attributes such as body proportions, mass distribution, and age significantly affect how actions are performed, and neglecting this coupling often leads to physically inconsistent motions. We propose an identity-aware motion generation framework that explicitly models the relationship between body morphology and motion dynamics. Instead of relying on explicit geometric measurements, identity is represented using multimodal signals, including natural language descriptions and visual cues. We further introduce a joint motion-shape generation paradigm that simultaneously synthesizes motion sequences and body shape parameters, allowing identity cues to directly modulate motion dynamics. Extensive experiments on motion capture datasets and large-scale in-the-wild videos demonstrate improved motion realism and motion-identity consistency while maintaining high motion quality. Project page: https://vjwq.github.io/IAM