HGGT: Robust and Flexible 3D Hand Mesh Reconstruction from Uncalibrated Images

arXiv cs.CV / 3/26/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • The paper tackles high-fidelity 3D hand mesh reconstruction from images while targeting deployment flexibility without requiring calibrated camera setups.
  • It addresses the tradeoff between single-view methods (which struggle with depth ambiguity and occlusions) and multi-view calibrated systems (which are less usable in real-world settings).
  • The authors propose a feed-forward architecture that jointly infers 3D hand meshes and camera poses from uncalibrated, arbitrary views.
  • The method is motivated by 3D foundation-model ideas that learn explicit geometry from visual data, reformulating reconstruction as a visual-geometry grounded task.
  • Experiments report state-of-the-art performance on benchmarks and strong generalization to uncalibrated, in-the-wild scenarios, with a public project page provided.

Abstract

Recovering high-fidelity 3D hand geometry from images is a critical task in computer vision, holding significant value for domains such as robotics, animation and VR/AR. Crucially, scalable applications demand both accuracy and deployment flexibility, requiring the ability to leverage massive amounts of unstructured image data from the internet or enable deployment on consumer-grade RGB cameras without complex calibration. However, current methods face a dilemma. While single-view approaches are easy to deploy, they suffer from depth ambiguity and occlusion. Conversely, multi-view systems resolve these uncertainties but typically demand fixed, calibrated setups, limiting their real-world utility. To bridge this gap, we draw inspiration from 3D foundation models that learn explicit geometry directly from visual data. By reformulating hand reconstruction from arbitrary views as a visual-geometry grounded task, we propose a feed-forward architecture that, for the first time in literature, jointly infers 3D hand meshes and camera poses from uncalibrated views. Extensive evaluations show that our approach outperforms state-of-the-art benchmarks and demonstrates strong generalization to uncalibrated, in-the-wild scenarios. Here is the link of our project page: https://lym29.github.io/HGGT/.