SentiAvatar: Towards Expressive and Interactive Digital Humans

arXiv cs.CV / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

SentiAvatar は、表情・ジェスチャー・音声に同期した動きをリアルタイムで生成できる「表現力のある対話型3Dデジタルヒューマン」構築フレームワークを提案しています。
研究では、(1) 大規模で高品質なマルチモーダルデータ不足、(2) 意味（セマンティクス）から動作への堅牢な対応、(3) 発話の韻律（プロソディ）とモーションのフレーム同期という3課題を同時に扱っています。
その解決のために、単一キャラクタのオプティカルモーションキャプチャで収集した対話コーパス SuSuInterActs（21Kクリップ、37時間）を構築し、さらに Motion Foundation Model を 200K+ モーション系列で事前学習しています。
音声を考慮した plan-then-infill（文単位の計画とフレーム単位の補間）により、文脈上適切な動作と発話リズムの同期を両立させ、SuSuInterActs/BEATv2 で先行手法を上回る結果を報告しています。
ソースコード、モデル、データセットが公開され、約6秒の出力を0.3秒で生成し、無制限のマルチターン・ストリーミングにも対応する点が示されています。

Abstract

We present SentiAvatar, a framework for building expressive interactive 3D digital humans, and use it to create SuSu, a virtual character that speaks, gestures, and emotes in real time. Achieving such a system remains challenging, as it requires jointly addressing three key problems: the lack of large-scale, high-quality multimodal data, robust semantic-to-motion mapping, and fine-grained frame-level motion-prosody synchronization. To solve these problems, first, we build SuSuInterActs (21K clips, 37 hours), a dialogue corpus captured via optical motion capture around a single character with synchronized speech, full-body motion, and facial expressions. Second, we pre-train a Motion Foundation Model on 200K+ motion sequences, equipping it with rich action priors that go well beyond the conversation. We then propose an audio-aware plan-then-infill architecture that decouples sentence-level semantic planning from frame-level prosody-driven interpolation, so that generated motions are both semantically appropriate and rhythmically aligned with speech. Experiments show that SentiAvatar achieves state-of-the-art on both SuSuInterActs (R@1 43.64%, nearly 2 times the best baseline) and BEATv2 (FGD 4.941, BC 8.078), producing 6s of output in 0.3s with unlimited multi-turn streaming. The source code, model, and dataset are available at https://sentiavatar.github.io.