F3G-Avatar : Face Focused Full-body Gaussian Avatar

arXiv cs.CV / 4/14/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • The paper introduces F3G-Avatar, a face-aware full-body Gaussian avatar synthesis method designed to preserve fine-grained facial geometry and expressions that prior full-body Gaussian approaches often miss.
  • F3G-Avatar builds animatable 3D Gaussian representations from multi-view RGB video plus regressed pose/shape parameters, using an MHR (Momentum Human Rig) template and a two-branch architecture (body deformation + face-focused deformation).
  • The method renders front/back positional maps, decodes them into 3D Gaussians, fuses the results, applies linear blend skinning (LBS), and trains with differentiable Gaussian splatting for end-to-end rendering.
  • Training uses a mix of reconstruction and perceptual losses plus a face-specific adversarial loss to improve realism in close-up face views.
  • Experiments on AvatarReX report strong face-view performance (PSNR/SSIM/LPIPS of 26.243/0.964/0.084), with ablations showing the importance of both the MHR template and the face-focused deformation branch.

Abstract

Existing full-body Gaussian avatar methods primarily optimize global reconstruction quality and often fail to preserve fine-grained facial geometry and expression details. This challenge arises from limited facial representational capacity that causes difficulties in modeling high-frequency pose-dependent deformations. To address this, we propose F3G-Avatar, a full-body, face-aware avatar synthesis method that reconstructs animatable human representations from multi-view RGB video and regressed pose/shape parameters. Starting from a clothed Momentum Human Rig (MHR) template, front/back positional maps are rendered and decoded into 3D Gaussians through a two-branch architecture: a body branch that captures pose-dependent non-rigid deformations and a face-focused deformation branch that refines head geometry and appearance. The predicted Gaussians are fused, posed with linear blend skinning (LBS), and rendered with differentiable Gaussian splatting. Training combines reconstruction and perceptual objectives with a face-specific adversarial loss to enhance realism in close-up views. Experiments demonstrate strong rendering quality, with face-view performance reaching PSNR/SSIM/LPIPS of 26.243/0.964/0.084 on the AvatarReX dataset. Ablations further highlight contributions of the MHR template and the face-focused deformation. F3G-Avatar provides a practical, high-quality pipeline for realistic, animatable full-body avatar synthesis.