Autoregressive Appearance Prediction for 3D Gaussian Avatars

arXiv cs.CV / 4/2/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses instability in 3D Gaussian Splatting human avatar rendering caused by pose/appearance ambiguities in large datasets, which can lead to overfitting and abrupt appearance changes for novel poses.
  • It proposes a 3D Gaussian avatar model with a spatial MLP backbone conditioned on pose plus a learned appearance latent to better disambiguate pose-driven renderings.
  • During training, an encoder learns a compact appearance latent representation that improves reconstruction quality and reduces spurious correlations.
  • At inference (driving) time, an autoregressive predictor infers the latent to achieve temporally smooth and more stable appearance evolution across frames.

Abstract

A photorealistic and immersive human avatar experience demands capturing fine, person-specific details such as cloth and hair dynamics, subtle facial expressions, and characteristic motion patterns. Achieving this requires large, high-quality datasets, which often introduce ambiguities and spurious correlations when very similar poses correspond to different appearances. Models that fit these details during training can overfit and produce unstable, abrupt appearance changes for novel poses. We propose a 3D Gaussian Splatting avatar model with a spatial MLP backbone that is conditioned on both pose and an appearance latent. The latent is learned during training by an encoder, yielding a compact representation that improves reconstruction quality and helps disambiguate pose-driven renderings. At driving time, our predictor autoregressively infers the latent, producing temporally smooth appearance evolution and improved stability. Overall, our method delivers a robust and practical path to high-fidelity, stable avatar driving.

Autoregressive Appearance Prediction for 3D Gaussian Avatars | AI Navigate