One-shot Compositional 3D Head Avatars with Deformable Hair

arXiv cs.CV / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper presents a one-shot method to generate a full 3D head avatar from a single image, addressing the long-standing problem of unrealistic hair motion in one-shot holistic approaches.
  • It explicitly decouples hair from the face by using separate deformation models and integrating both parts into a unified rendering pipeline for more natural geometry and deformations.
  • The approach preserves fine textures by lifting both the original portrait and a hair-removed (bald) version into dense, detail-rich 3D Gaussian Splatting (3DGS) representations.
  • The bald face 3DGS is rigged to a FLAME mesh via non-rigid registration for mesh-following animation, while hair is extracted as isolated hair Gaussians using semantic supervision and boundary-aware reassignment.
  • Hair deformation is controlled with a cage structure running Position-Based Dynamics (PBD), enabling physically plausible hair Gaussian transformations under head motion, gravity, and inertial effects, and achieving improved perceptual realism versus prior one-shot methods.

Abstract

We propose a compositional method for constructing a complete 3D head avatar from a single image. Prior one-shot holistic approaches frequently fail to produce realistic hair dynamics during animation, largely due to inadequate decoupling of hair from the facial region, resulting in entangled geometry and unnatural deformations. Our method explicitly decouples hair from the face, modeling these components using distinct deformation paradigms while integrating them into a unified rendering pipeline. Furthermore, by leveraging image-to-3D lifting techniques, we preserve fine-grained textures from the input image to the greatest extent possible, effectively mitigating the common issue of high-frequency information loss in generalized models. Specifically, given a frontal portrait image, we first perform hair removal to obtain a bald image. Both the original image and the bald image are then lifted to dense, detail-rich 3D Gaussian Splatting (3DGS) representations. For the bald 3DGS, we rig it to a FLAME mesh via non-rigid registration with a prior model, enabling natural deformation that follows the mesh triangles during animation. For the hair component, we employ semantic label supervision combined with a boundary-aware reassignment strategy to extract a clean and isolated set of hair Gaussians. To control hair deformation, we introduce a cage structure that supports Position-Based Dynamics (PBD) simulation, allowing realistic and physically plausible transformations of the hair Gaussian primitives under head motion, gravity, and inertial effects. Striking qualitative results, including dynamic animations under diverse head motions, gravity effects, and expressions, showcase substantially more realistic hair behavior alongside faithfully preserved facial details, outperforming state-of-the-art one-shot methods in perceptual realism.