Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars

arXiv cs.CV / 4/3/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that improving 3D Gaussian avatar reconstruction is driven more by better body rigs than by simply increasing training pipeline complexity.
  • Replacing SMPL with the Momentum Human Rig (MHR), estimated using SAM-3D-Body and using a minimal pipeline without learned deformations, reportedly yields the highest PSNR and competitive or better LPIPS/SSIM on PeopleSnapshot and ZJU-MoCap.
  • Controlled ablations separate pose-estimation quality from the body model’s representational capacity by swapping poses and meshes between MHR and SMPL-X under identical training conditions.
  • The results indicate that body-model expressiveness is a primary bottleneck, with both mesh representational capacity and pose estimation quality contributing meaningfully to performance improvements across the full pipeline.

Abstract

Recent 3D Gaussian splatting methods built atop SMPL achieve remarkable visual fidelity while continually increasing the complexity of the overall training architecture. We demonstrate that much of this complexity is unnecessary: by replacing SMPL with the Momentum Human Rig (MHR), estimated via SAM-3D-Body, a minimal pipeline with no learned deformations or pose-dependent corrections achieves the highest reported PSNR and competitive or superior LPIPS and SSIM on PeopleSnapshot and ZJU-MoCap. To disentangle pose estimation quality from body model representational capacity, we perform two controlled ablations: translating SAM-3D-Body meshes to SMPL-X, and translating the original dataset's SMPL poses into MHR both retrained under identical conditions. These ablations confirm that body model expressiveness has been a primary bottleneck in avatar reconstruction, with both mesh representational capacity and pose estimation quality contributing meaningfully to the full pipeline's gains.