HuM-Eval: A Coarse-to-Fine Framework for Human-Centric Video Evaluation
arXiv cs.CV / 4/29/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The HuM-Eval paper introduces a human-centric video evaluation framework to better judge generated human-motion quality beyond coarse scene-level metrics.
- It applies a coarse-to-fine pipeline: a vision-language model first assesses overall video quality, then 2D pose checks anatomical correctness and 3D motion analysis evaluates motion stability.
- Experiments report an average human correlation of 58.2%, outperforming prior state-of-the-art baselines.
- The authors also release HuM-Bench, a benchmark with 1,000 diverse prompts, and use it to evaluate existing text-to-video models, supporting progress toward next-generation human motion generation.
Related Articles

How I Use AI Agents to Maintain a Living Knowledge Base for My Team
Dev.to
IK_LLAMA now supports Qwen3.5 MTP Support :O
Reddit r/LocalLLaMA
OpenAI models, Codex, and Managed Agents come to AWS
Dev.to

Indian Developers: How to Build AI Side Income with $0 Capital in 2026
Dev.to

Vertical SaaS for Startups 2026: Building a Niche AI-First Product
Dev.to