Pose-Aware Diffusion for 3D Generation

arXiv cs.CV / 5/4/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Pose-Aware Diffusion (PAD) is a new end-to-end diffusion framework aimed at generating 3D objects aligned with a target pose, addressing ambiguities in canonical-then-rotate pipelines.
  • PAD generates 3D geometry directly in the observation space by unprojecting monocular depth into a partial point cloud and using it as an explicit 3D geometric anchor to provide stronger spatial supervision.
  • The method removes pose ambiguity intrinsically, producing high-fidelity pose-aligned assets with improved geometric alignment.
  • Experiments show PAD outperforms existing state-of-the-art approaches in both geometric alignment and image-to-3D correspondence.
  • PAD can be extended to compositional 3D scene reconstruction by unioning independently generated objects, maintaining accurate spatial layouts across multiple parts.

Abstract

Generating pose-aligned 3D objects is challenging due to the spatial mismatches and transformation ambiguities inherent in decoupled canonical-then-rotate paradigms. To this end, we introduce Pose-Aware Diffusion (PAD), a novel end-to-end diffusion framework that synthesizes 3D geometry directly within the observation space. By unprojecting monocular depth into a partial point cloud and explicitly injecting it as a 3D geometric anchor, PAD abandons canonical assumptions to enforce rigorous spatial supervision. This native generation intrinsically resolves pose ambiguity, producing high-fidelity pose-aligned assets. Extensive experiments demonstrate that PAD achieves superior geometric alignment and image-to-3D correspondence compared to state-of-the-art methods. Additionally, PAD naturally extends to compositional 3D scene reconstruction via a simple union of independently generated objects, highlighting its robust ability to preserve precise spatial layouts.