VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation

arXiv cs.CV / 5/5/2026

📰 NewsTools & Practical UsageModels & Research

共有:

Key Points

The paper introduces VAnim, a first LLM-based, open-domain text-to-SVG animation framework aimed at producing professional, structure-editable animations rather than breaking topology.
Instead of generating full frame sequences, VAnim models animation as Sparse State Updates (SSU) on a persistent SVG DOM tree, achieving more than 9.8× sequence-length compression while preserving the DOM structure.
It proposes an Identification-First Motion Planning approach to map textual instructions to explicit visual entities for finer control over the resulting motion.
To handle the non-differentiability of SVG rendering, VAnim uses Rendering-Aware Reinforcement Learning with Group Relative Policy Optimization (GRPO) and a hybrid reward driven by a video perception encoder for high-fidelity visual alignment.
The authors also release SVGAnim-134k, a new benchmark for vector animation, and report improved performance over prior methods in semantic alignment and structural validity, with further evidence of motion quality and identity preservation.

Abstract

Scalable Vector Graphics (SVG) animation generation is pivotal for professional design due to their structural editability and resolution independence. However, this task remains challenging as it requires bridging discrete code representations with continuous visual dynamics. Existing optimization-based methods often destroy topological consistency, while general-purpose LLMs rely on rigid CSS/SMIL transformations, failing to model geometry-level non-rigid deformations. To address these limitations, we present VAnim, the first LLM-based framework for open-domain text-to-SVG animation. We reconceptualize animation not as sequence generation, but as Sparse State Updates (SSU) on a persistent SVG DOM tree. This paradigm compresses sequence length by over 9.8x while preserving the SVG DOM structure and non-participating elements by construction. To enable precise control, we propose an Identification-First Motion Planning mechanism that grounds textual instructions in explicit visual entities. Furthermore, to overcome the non-differentiable nature of SVG rendering, we employ Rendering-Aware Reinforcement Learning via Group Relative Policy Optimization (GRPO). By leveraging a hybrid reward from a state-of-the-art video perception encoder, we align discrete code updates with high-fidelity visual feedback. We also introduce SVGAnim-134k, the first benchmark for vector animation. Extensive experiments demonstrate that VAnim significantly outperforms state-of-the-art baselines in semantic alignment and structural validity, with additional appendix metrics further validating motion quality and identity preservation.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 5/5DailyView insight →

Black Hat USA

AI Business

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

Dev.to

First experience with Building Apps with Google AI Studio: Incredibly simple and intuitive.

Dev.to

How AI is Changing the Way We Code in 2026: The Shift from Syntax to Strategy

Dev.to

13 CLAUDE.md Rules That Make AI Write Modern PHP (Not PHP 5 Resurrected)

Dev.to

VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat USA

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

First experience with Building Apps with Google AI Studio: Incredibly simple and intuitive.

How AI is Changing the Way We Code in 2026: The Shift from Syntax to Strategy

13 CLAUDE.md Rules That Make AI Write Modern PHP (Not PHP 5 Resurrected)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer