Structure-Aware Fine-Grained Gaussian Splatting for Expressive Avatar Reconstruction

arXiv cs.CV / 4/13/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Structure-aware Fine-grained Gaussian Splatting (SFGS) to reconstruct photorealistic, topology-aware 3D human avatars from monocular video while preserving expressive details like hands and facial expressions.
  • SFGS combines spatial-only triplanes with a time-aware hexplane to model dynamic features across consecutive frames and improve pose-dependent texture and expression.
  • It adds a structure-aware Gaussian module to capture fine details in a spatially coherent way, addressing limitations of prior methods that miss subtle motion and surface changes.
  • A residual refinement module is proposed specifically to better model hand deformations via fine-grained hand reconstruction.
  • The authors report single-stage training and claim improved performance over state-of-the-art baselines, and they provide an associated GitHub code repository.

Abstract

Reconstructing photorealistic and topology-aware human avatars from monocular videos remains a significant challenge in the fields of computer vision and graphics. While existing 3D human avatar modeling approaches can effectively capture body motion, they often fail to accurately model fine details such as hand movements and facial expressions. To address this, we propose Structure-aware Fine-grained Gaussian Splatting (SFGS), a novel method for reconstructing expressive and coherent full-body 3D human avatars from a monocular video sequence. The SFGS use both spatial-only triplane and time-aware hexplane to capture dynamic features across consecutive frames. A structure-aware gaussian module is designed to capture pose-dependent details in a spatially coherent manner and improve pose and texture expression. To better model hand deformations, we also propose a residual refinement module based on fine-grained hand reconstruction. Our method requires only a single-stage training and outperforms state-of-the-art baselines in both quantitative and qualitative evaluations, generating high-fidelity avatars with natural motion and fine details. The code is on Github: https://github.com/Su245811YZ/SFGS