AI Navigate

Feed-forward Gaussian Registration for Head Avatar Creation and Editing

arXiv cs.CV / 3/18/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • MATCH introduces Multi-view Avatars from Topologically Corresponding Heads (MATCH), a Gaussian registration method for fast head avatar creation and editing from calibrated multi-view images that predicts Gaussian splat textures in correspondence in 0.5 seconds per frame.
  • The approach eliminates time-consuming head tracking and expensive optimization, reducing typical creation time from over a day to about 0.5 seconds per frame.
  • It uses a transformer-based model to estimate textures in a fixed UV layout with a novel registration-guided attention block, where each UV-map token attends only to image tokens from its corresponding mesh region, improving efficiency over dense cross-view attention.
  • The method enables cross-subject correspondence for applications such as expression transfer, semantic editing, identity interpolation, and optimization-free tracking, and it outperforms existing methods in novel-view synthesis and head avatar generation, achieving a tenfold speedup over the closest baseline.

Abstract

We present MATCH (Multi-view Avatars from Topologically Corresponding Heads), a multi-view Gaussian registration method for high-quality head avatar creation and editing. State-of-the-art multi-view head avatar methods require time-consuming head tracking followed by expensive avatar optimization, often resulting in a total creation time of more than one day. MATCH, in contrast, directly predicts Gaussian splat textures in correspondence from calibrated multi-view images in just 0.5 seconds per frame, without requiring data preprocessing. The learned intra-subject correspondence across frames enables fast creation of personalized head avatars, while correspondence across subjects supports applications such as expression transfer, optimization-free tracking, semantic editing, and identity interpolation. We establish these correspondences end-to-end using a transformer-based model that predicts Gaussian splat textures in the fixed UV layout of a template mesh. To achieve this, we introduce a novel registration-guided attention block, where each UV-map token attends exclusively to image tokens depicting its corresponding mesh region. This design improves efficiency and performance compared to dense cross-view attention. MATCH outperforms existing methods in novel-view synthesis, geometry registration, and head avatar generation, while making avatar creation 10 times faster than the closest competing baseline. The code and model weights are available on the project website.