GraphiContact: Pose-aware Human-Scene Robust Contact Perception for Interactive Systems

arXiv cs.CV / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • GraphiContact tackles monocular, vertex-level human-scene contact prediction by jointly leveraging single-image 3D human mesh reconstruction as a geometric scaffold for contact reasoning.
  • The method is pose-aware and transfers complementary human priors from two pretrained Transformer encoders to predict per-vertex contact on the reconstructed mesh.
  • To handle occlusion and perceptual noise, GraphiContact introduces SIMU training with token-level adaptive routing that simulates difficult real-world observations while keeping efficient single-branch inference at test time.
  • Experiments across five benchmark datasets report consistent improvements on both contact prediction and 3D human reconstruction, suggesting the approach improves interaction understanding end-to-end.
  • The authors provide code (planned for public release) and position the system for interactive applications like assistive monitoring, embodied AI, and rehabilitation analysis.

Abstract

Monocular vertex-level human-scene contact prediction is a fundamental capability for interactive systems such as assistive monitoring, embodied AI, and rehabilitation analysis. In this work, we study this task jointly with single-image 3D human mesh reconstruction, using reconstructed body geometry as a scaffold for contact reasoning. Existing approaches either focus on contact prediction without sufficiently exploiting explicit 3D human priors, or emphasize pose/mesh reconstruction without directly optimizing robust vertex-level contact inference under occlusion and perceptual noise. To address this gap, we propose GraphiContact, a pose-aware framework that transfers complementary human priors from two pretrained Transformer encoders and predicts per-vertex human-scene contact on the reconstructed mesh. To improve robustness in real-world scenarios, we further introduce a Single-Image Multi-Infer Uncertainty (SIMU) training strategy with token-level adaptive routing, which simulates occlusion and noisy observations during training while preserving efficient single-branch inference at test time. Experiments on five benchmark datasets show that GraphiContact achieves consistent gains on both contact prediction and 3D human reconstruction. Our code, based on the GraphiContact method, provides comprehensive 3D human reconstruction and interaction analysis, and will be publicly available at https://github.com/Aveiro-Lin/GraphiContact.