PortraitDirector: A Hierarchical Disentanglement Framework for Controllable and Real-time Facial Reenactment
arXiv cs.CV / 4/22/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- PortraitDirector proposes a hierarchical, compositional approach to facial reenactment to resolve the common trade-off between expressiveness and fine-grained controllability.
- The framework disentangles facial motion into a Spatial Layer (global head pose plus local expressions filtered from emotional cues) and a Semantic Layer (global emotion), then recomposes them into an expressive motion latent.
- An Emotion-Filtering Module based on an information bottleneck helps remove emotional signals from the local expression components to improve disentanglement quality.
- To enable real-time use, the method applies optimizations such as diffusion distillation, causal attention, and VAE acceleration.
- The paper reports streaming 512×512 reenactment at 20 FPS with end-to-end ~800 ms latency on a single NVIDIA 5090 GPU.



