AHS: Adaptive Head Synthesis via Synthetic Data Augmentations

arXiv cs.CV / 4/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Adaptive Head Synthesis (AHS) to improve portrait head-swapping/manipulation by training on full upper-body images rather than face-centered crops with limited angles.
  • AHS uses a new head reenacted synthetic data augmentation strategy to ease self-supervised training constraints while avoiding the need for paired training data.
  • Experimental results indicate AHS generalizes better across a wide range of head poses, facial expressions, and hairstyles, producing more visually coherent blends beyond just the face region.
  • The method demonstrates strong robustness in preserving facial identity even under drastic expression changes and large head-pose variations, while also maintaining accessories accurately.

Abstract

Recent digital media advancements have created increasing demands for sophisticated portrait manipulation techniques, particularly head swapping, where one's head is seamlessly integrated with another's body. However, current approaches predominantly rely on face-centered cropped data with limited view angles, significantly restricting their real-world applicability. They struggle with diverse head expressions, varying hairstyles, and natural blending beyond facial regions. To address these limitations, we propose Adaptive Head Synthesis (AHS), which effectively handles full upper-body images with varied head poses and expressions. AHS incorporates a novel head reenacted synthetic data augmentation strategy to overcome self-supervised training constraints, enhancing generalization across diverse facial expressions and orientations without requiring paired training data. Comprehensive experiments demonstrate that AHS achieves superior performance in challenging real-world scenarios, producing visually coherent results that preserve identity and expression fidelity across various head orientations and hairstyles. Notably, AHS shows exceptional robustness in maintaining facial identity while drastic expression changes and faithfully preserving accessories while significant head pose variations.