LPM 1.0: Video-based Character Performance Model

arXiv cs.CV / 4/10/2026

📰 NewsSignals & Early TrendsModels & Research

共有:

Key Points

The paper introduces LPM 1.0 (Large Performance Model), which learns video-based character performance—intent, emotion, and personality—from audio-visual conversation to avoid traditional 3D character pipelines.
It targets a stated “performance trilemma” by jointly improving expressiveness, real-time inference, and long-horizon identity stability, focusing specifically on single-person full-duplex audio-visual conversational performance.
LPM 1.0 builds a multimodal human-centric dataset using strict filtering and identity-aware multi-reference extraction, then trains a 17B-parameter Diffusion Transformer for controllable, identity-consistent generation via multimodal conditioning.
The model is distilled into an Online LPM causal streaming generator designed for low-latency, infinite-length interactions, enabling real-time listening/speaking video synthesis from user audio and synthesized speech.
The work also proposes LPM-Bench, a new benchmark for interactive character performance, reporting state-of-the-art results across evaluated dimensions.

Abstract

Performance, the externalization of intent, emotion, and personality through visual, vocal, and temporal behavior, is what makes a character alive. Learning such performance from video is a promising alternative to traditional 3D pipelines. However, existing video models struggle to jointly achieve high expressiveness, real-time inference, and long-horizon identity stability, a tension we call the performance trilemma. Conversation is the most comprehensive performance scenario, as characters simultaneously speak, listen, react, and emote while maintaining identity over time. To address this, we present LPM 1.0 (Large Performance Model), focusing on single-person full-duplex audio-visual conversational performance. Concretely, we build a multimodal human-centric dataset through strict filtering, speaking-listening audio-video pairing, performance understanding, and identity-aware multi-reference extraction; train a 17B-parameter Diffusion Transformer (Base LPM) for highly controllable, identity-consistent performance through multimodal conditioning; and distill it into a causal streaming generator (Online LPM) for low-latency, infinite-length interaction. At inference, given a character image with identity-aware references, LPM 1.0 generates listening videos from user audio and speaking videos from synthesized audio, with text prompts for motion control, all at real-time speed with identity-stable, infinite-length generation. LPM 1.0 thus serves as a visual engine for conversational agents, live streaming characters, and game NPCs. To systematically evaluate this setting, we propose LPM-Bench, the first benchmark for interactive character performance. LPM 1.0 achieves state-of-the-art results across all evaluated dimensions while maintaining real-time inference.

Black Hat Asia

AI Business

v0.20.5

Ollama Releases

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS

Dev.to

Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.

Reddit r/LocalLLaMA

SoloEngine: Low-Code Agentic AI Development Platform with Native Support for Multi-Agent Collaboration, MCP, and Skill System

Dev.to

LPM 1.0: Video-based Character Performance Model

Key Points

Abstract

Related Articles

Black Hat Asia

v0.20.5

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS

Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.

SoloEngine: Low-Code Agentic AI Development Platform with Native Support for Multi-Agent Collaboration, MCP, and Skill System

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer