Shared Representation for 3D Pose Estimation, Action Classification, and Progress Prediction from Tactile Signals

arXiv cs.CV / 3/30/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses how to estimate 3D human pose, classify actions, and predict action completion progress using tactile signals to avoid occlusion and privacy issues common in vision-based methods.
  • It proposes SCOTTI, a Shared Convolutional Transformer for Tactile Inference, which learns a shared representation to perform all three tasks jointly via multi-task learning.
  • The work claims novelty in exploring action progress prediction specifically from foot tactile signals using custom wireless insole sensors.
  • Experiments report that SCOTTI outperforms prior approaches on all three tasks compared with separate single-task learning.
  • The authors introduce a new tactile dataset collected from 15 participants (7 hours total) performing eight activities, supporting training and evaluation of the proposed approach.

Abstract

Estimating human pose, classifying actions, and predicting movement progress are essential for human-robot interaction. While vision-based methods suffer from occlusion and privacy concerns in realistic environments, tactile sensing avoids these issues. However, prior tactile-based approaches handle each task separately, leading to suboptimal performance. In this study, we propose a Shared COnvolutional Transformer for Tactile Inference (SCOTTI) that learns a shared representation to simultaneously address three separate prediction tasks: 3D human pose estimation, action class categorization, and action completion progress estimation. To the best of our knowledge, this is the first work to explore action progress prediction using foot tactile signals from custom wireless insole sensors. This unified approach leverages the mutual benefits of multi-task learning, enabling the model to achieve improved performance across all three tasks compared to learning them independently. Experimental results demonstrate that SCOTTI outperforms existing approaches across all three tasks. Additionally, we introduce a novel dataset collected from 15 participants performing various activities and exercises, with 7 hours of total duration, across eight different activities.