PASR: Pose-Aware 3D Shape Retrieval from Occluded Single Views

arXiv cs.CV / 4/27/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces PASR, a pose-aware framework for retrieving 3D shapes from a single, potentially occluded image.
  • PASR formulates retrieval as a feature-level analysis-by-synthesis problem, distilling knowledge from the 2D foundation model DINOv3 into a 3D encoder.
  • During inference, it uses test-time optimization to jointly search for the best shape and pose by reconstructing patch-level 2D feature maps from the input image.
  • The method is designed to be robust to partial occlusions and to better capture fine-grained geometric details, outperforming prior approaches on both clean and occluded benchmarks.
  • PASR also supports multiple tasks in one framework, delivering strong shape retrieval along with competitive pose estimation and category classification.

Abstract

Single-view 3D shape retrieval is a fundamental yet challenging task that is increasingly important with the growth of available 3D data. Existing approaches largely fall into two categories: those using contrastive learning to map point cloud features into existing vision-language spaces and those that learn a common embedding space for 2D images and 3D shapes. However, these feed-forward, holistic alignments are often difficult to interpret, which in turn limits their robustness and generalization to real-world applications. To address this problem, we propose Pose-Aware 3D Shape Retrieval (PASR), a framework that formulates retrieval as a feature-level analysis-by-synthesis problem by distilling knowledge from a 2D foundation model (DINOv3) into a 3D encoder. By aligning pose-conditioned 3D projections with 2D feature maps, our method bridges the gap between real-world images and synthetic meshes. During inference, PASR performs a test-time optimization via analysis-by-synthesis, jointly searching for the shape and pose that best reconstruct the patch-level feature map of the input image. This synthesis-based optimization is inherently robust to partial occlusion and sensitive to fine-grained geometric details. PASR substantially outperforms existing methods on both clean and occluded 3D shape retrieval datasets by a wide margin. Additionally, PASR demonstrates strong multi-task capabilities, achieving robust shape retrieval, competitive pose estimation, and accurate category classification within a single framework.