Any3DAvatar: Fast and High-Quality Full-Head 3D Avatar Reconstruction from Single Portrait Image

arXiv cs.CV / 4/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes Any3DAvatar, a single-portrait method for reconstructing a full 3D head as 3D Gaussians that targets the long-standing quality-versus-speed trade-off.
  • It claims sub-second performance (under one second in the fastest setting) while preserving high-fidelity geometry and texture compared with prior single-image full-head reconstruction approaches.
  • The authors introduce AnyHead, a unified training data suite designed to improve coverage, full-head geometry, and complex appearance (including accessories) by combining identity diversity and dense multi-view supervision.
  • Their approach uses a Plücker-aware structured 3D Gaussian scaffold with one-step conditional denoising (single forward pass) rather than unstructured noise sampling, aiming to retain detailed reconstruction quality.
  • They add view-conditioned appearance supervision on latent tokens to enhance novel-view texture details without increasing inference cost.

Abstract

Reconstructing a complete 3D head from a single portrait remains challenging because existing methods still face a sharp quality-speed trade-off: high-fidelity pipelines often rely on multi-stage processing and per-subject optimization, while fast feed-forward models struggle with complete geometry and fine appearance details. To bridge this gap, we propose Any3DAvatar, a fast and high-quality method for single-image 3D Gaussian head avatar generation, whose fastest setting reconstructs a full head in under one second while preserving high-fidelity geometry and texture. First, we build AnyHead, a unified data suite that combines identity diversity, dense multi-view supervision, and realistic accessories, filling the main gaps of existing head data in coverage, full-head geometry, and complex appearance. Second, rather than sampling unstructured noise, we initialize from a Pl\"ucker-aware structured 3D Gaussian scaffold and perform one-step conditional denoising, formulating full-head reconstruction into a single forward pass while retaining high fidelity. Third, we introduce auxiliary view-conditioned appearance supervision on the same latent tokens alongside 3D Gaussian reconstruction, improving novel-view texture details at zero extra inference cost. Experiments show that Any3DAvatar outperforms prior single-image full-head reconstruction methods in rendering fidelity while remaining substantially faster.