DanceCrafter: Fine-Grained Text-Driven Controllable Dance Generation via Choreographic Syntax

arXiv cs.AI / 4/22/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper proposes a new theoretical framework called “Choreographic Syntax” to better describe and annotate complex, text-driven controllable dance instructions.
It builds “DanceFlow,” a highly fine-grained dataset combining professional dance archives with high-fidelity motion capture, totaling 41 hours of motion and 6.34 million words of descriptions.
It introduces “DanceCrafter,” a tailored motion-transformer model based on the Momentum Human Rig, using a continuous manifold motion representation and hybrid normalization to improve training stability.
The model also uses an anatomy-aware loss to regulate the natural decoupled movement of different body parts, enabling stable and high-fidelity dance generation.
Extensive evaluations and user studies report state-of-the-art results in motion quality, fine-grained controllability, and naturalness of generated sequences.

Abstract

Text-driven controllable dance generation remains under-explored, primarily due to the severe scarcity of high-quality datasets and the inherent difficulty of articulating complex choreographies. Characterizing dance is particularly challenging owing to its intricate spatial dynamics, strong directionality, and the highly decoupled movements of distinct body parts. To overcome these bottlenecks, we bridge principles from dance studies, human anatomy, and biomechanics to propose \textit{Choreographic Syntax}, a novel theoretical framework with a tailored annotation system. Grounded in this syntax, we combine professional dance archives with high-fidelity motion capture data to construct \textbf{DanceFlow}, the most fine-grained dance dataset to date. It encompasses 41 hours of high-quality motions paired with 6.34 million words of detailed descriptions. At the model level, we introduce \textbf{DanceCrafter}, a tailored motion transformer built upon the Momentum Human Rig. To circumvent optimization instabilities, we construct a continuous manifold motion representation paired with a hybrid normalization strategy. Furthermore, we design an anatomy-aware loss to explicitly regulate the decoupled nature of body parts. Together, these adaptations empower DanceCrafter to achieve the high-fidelity and stable generation of complex dance sequences. Extensive evaluations and user studies demonstrate our state-of-the-art performance in motion quality, fine-grained controllability, and generation naturalness.

GeoReg LLM-Driven Few-Shot Socio-Economic Estimation for Data-Scarce Regions

Dev.to

Enterprise AI Governance Has Shifted from Policy to Execution

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Rethinking CNN Models for Audio Classification

Dev.to

Database world trying to build natural language query systems again – this time with LLMs

The Register

DanceCrafter: Fine-Grained Text-Driven Controllable Dance Generation via Choreographic Syntax

Key Points

Abstract

Related Articles

GeoReg LLM-Driven Few-Shot Socio-Economic Estimation for Data-Scarce Regions

Enterprise AI Governance Has Shifted from Policy to Execution

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Rethinking CNN Models for Audio Classification

Database world trying to build natural language query systems again – this time with LLMs

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer