HumanScore: Benchmarking Human Motions in Generated Videos

arXiv cs.CV / 4/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces HumanScore, a systematic evaluation framework to measure how accurately AI video generation models reproduce human body motion and dynamics, beyond just visual realism.
  • HumanScore uses six interpretable metrics covering kinematic plausibility, temporal stability, and biomechanical consistency to provide fine-grained diagnostics.
  • Using carefully designed prompts, the authors generate diverse motion clips at varying intensities and evaluate videos produced by thirteen state-of-the-art models.
  • The analysis finds recurring failure modes such as temporal jitter, anatomically implausible poses, and motion drift, and highlights gaps between perceptual plausibility and biomechanical fidelity.
  • The framework delivers robust quantitative model rankings grounded in physically meaningful criteria, enabling clearer comparisons across motion quality.

Abstract

Recent advances in model architectures, compute, and data scale have driven rapid progress in video generation, producing increasingly realistic content. Yet, no prior method systematically measures how faithfully these systems render human bodies and motion dynamics. In this paper, we present HumanScore, a systematic framework to evaluate the quality of human motions in AI-generated videos. HumanScore defines six interpretable metrics spanning kinematic plausibility, temporal stability, and biomechanical consistency, enabling fine-grained diagnosis beyond visual realism alone. Through carefully designed prompts, we elicit a diverse set of movements at varying intensities and evaluate videos generated by thirteen state-of-the-art models. Our analysis reveals consistent gaps between perceptual plausibility and motion biomechanical fidelity, identifies recurrent failure modes (e.g., temporal jitter, anatomically implausible poses, and motion drift), and produces robust model rankings from quantitative and physically meaningful criteria.