Cov2Pose: Leveraging Spatial Covariance for Direct Manifold-aware 6-DoF Object Pose Estimation

arXiv cs.CV / 3/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Introduces Cov2Pose, a direct end-to-end 6-DoF object pose estimator that uses a covariance-pooled representation to capture spatial second-order statistics in features.
  • Proposes encoding the pose as a symmetric positive definite (SPD) matrix via its Cholesky decomposition and regressing it with a manifold-aware head that respects SPD geometry.
  • Demonstrates that second-order pooling and continuous SPD representations improve robustness and accuracy, particularly under partial occlusion, over traditional direct heads.
  • Provides experiments and ablations showing the end-to-end pipeline is effective and can offer efficiency advantages compared with indirect 2D-keypoint + PnP approaches.

Abstract

In this paper, we address the problem of 6-DoF object pose estimation from a single RGB image. Indirect methods that typically predict intermediate 2D keypoints, followed by a Perspective-n-Point solver, have shown great performance. Direct approaches, which regress the pose in an end-to-end manner, are usually computationally more efficient but less accurate. However, direct heads rely on globally pooled features, ignoring spatial second-order statistics despite their informativeness in pose prediction. They also predict, in most cases, discontinuous pose representations that lack robustness. Herein, we therefore propose a covariance-pooled representation that encodes convolutional feature distributions as a symmetric positive definite (SPD) matrix. Moreover, we propose a novel pose encoding in the form of an SPD matrix via its Cholesky decomposition. Pose is then regressed in an end-to-end manner with a manifold-aware network head, taking into account the Riemannian geometry of SPD matrices. Experiments and ablations consistently demonstrate the relevance of second-order pooling and continuous representations for direct pose regression, including under partial occlusion.