Giving Faces Their Feelings Back: Explicit Emotion Control for Feedforward Single-Image 3D Head Avatars

arXiv cs.CV / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces a framework for explicit, first-class emotion control in feed-forward, single-image 3D head avatar reconstruction, aiming to decouple emotion from geometry and appearance.
It injects emotion into existing architectures using a dual-path modulation approach: geometry-conditioned normalization for separating emotion from speech-driven articulation, and appearance modulation for identity-aware, emotion-dependent visual cues.
To train under this setup, the authors build a time-synchronized, emotion-consistent multi-identity dataset by transferring aligned emotional dynamics across different identities.
Experiments integrating the method into multiple state-of-the-art backbones show preserved reconstruction/reenactment fidelity while enabling controllable emotion transfer, disentangled manipulation, and smooth emotion interpolation.

Abstract

We present a framework for explicit emotion control in feed-forward, single-image 3D head avatar reconstruction. Unlike existing pipelines where emotion is implicitly entangled with geometry or appearance, we treat emotion as a first-class control signal that can be manipulated independently and consistently across identities. Our method injects emotion into existing feed-forward architectures via a dual-path modulation mechanism without modifying their core design. Geometry modulation performs emotion-conditioned normalization in the original parametric space, disentangling emotional state from speech-driven articulation, while appearance modulation captures identity-aware, emotion-dependent visual cues beyond geometry. To enable learning under this setting, we construct a time-synchronized, emotion-consistent multi-identity dataset by transferring aligned emotional dynamics across identities. Integrated into multiple state-of-the-art backbones, our framework preserves reconstruction and reenactment fidelity while enabling controllable emotion transfer, disentangled manipulation, and smooth emotion interpolation, advancing expressive and scalable 3D head avatars.