AnimationBench: Are Video Models Good at Character-Centric Animation?

arXiv cs.CV / 4/17/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • The paper introduces AnimationBench, a new benchmark designed specifically to evaluate character-centric animation in animation-style image-to-video (I2V) generation, where realism-focused benchmarks fall short.
  • AnimationBench measures animation quality by operationalizing the Twelve Basic Principles of Animation and IP Preservation, and adds broader quality dimensions such as semantic consistency, motion rationality, and camera motion consistency.
  • It supports both standardized closed-set evaluations for reproducible comparisons and flexible open-set evaluations for diagnostic, custom analysis in open-domain scenarios.
  • The benchmark uses visual-language models to enable scalable scoring, and experimental results indicate strong alignment with human judgments while revealing animation-specific differences among state-of-the-art I2V models.

Abstract

Video generation has advanced rapidly, with recent methods producing increasingly convincing animated results. However, existing benchmarks-largely designed for realistic videos-struggle to evaluate animation-style generation with its stylized appearance, exaggerated motion, and character-centric consistency. Moreover, they also rely on fixed prompt sets and rigid pipelines, offering limited flexibility for open-domain content and custom evaluation needs. To address this gap, we introduce AnimationBench, the first systematic benchmark for evaluating animation image-to-video generation. AnimationBench operationalizes the Twelve Basic Principles of Animation and IP Preservation into measurable evaluation dimensions, together with Broader Quality Dimensions including semantic consistency, motion rationality, and camera motion consistency. The benchmark supports both a standardized close-set evaluation for reproducible comparison and a flexible open-set evaluation for diagnostic analysis, and leverages visual-language models for scalable assessment. Extensive experiments show that AnimationBench aligns well with human judgment and exposes animation-specific quality differences overlooked by realism-oriented benchmarks, leading to more informative and discriminative evaluation of state-of-the-art I2V models.