SurfaceXR: Fusing Smartwatch IMUs and Egocentric Hand Pose for Seamless Surface Interactions

arXiv cs.CV / 3/23/2026

💬 OpinionModels & Research

共有:

Key Points

SurfaceXR fuses headset-based hand tracking with smartwatch IMU data to enable robust surface-based inputs in XR, addressing fatigue and imprecision of mid-air gestures.
The approach leverages complementary modalities: 3D hand pose from vision and high-frequency motion from IMUs to improve accuracy on everyday surfaces.
A 21-participant study demonstrated improved touch tracking and 8-class gesture recognition compared with single-modality methods.
The work aims to solve egocentric vision hand-tracking challenges and unreliable surface plane estimation, offering a more comfortable and reliable interaction method for XR users.

Abstract

Mid-air gestures in Extended Reality (XR) often cause fatigue and imprecision. Surface-based interactions offer improved accuracy and comfort, but current egocentric vision methods struggle due to hand tracking challenges and unreliable surface plane estimation. We introduce SurfaceXR, a sensor fusion approach combining headset-based hand tracking with smartwatch IMU data to enable robust inputs on everyday surfaces. Our insight is that these modalities are complementary: hand tracking provides 3D positional data while IMUs capture high-frequency motion. A 21-participant study validates SurfaceXR's effectiveness for touch tracking and 8-class gesture recognition, demonstrating significant improvements over single-modality approaches.

$500 GPU outperforms Claude Sonnet on coding benchmarks

Dev.to

[D] Real-time Student Attention Detection: ResNet vs Facial Landmarks - Which approach for resource-constrained deployment?

Reddit r/MachineLearning

GLM-5.1 is live – coding ability on par with Claude Opus 4.5

Reddit r/LocalLLaMA

Semantically Self-Aligned Network for Text-to-Image Part-aware PersonRe-identification

Dev.to

FlashAttention from first principles

Reddit r/LocalLLaMA

SurfaceXR: Fusing Smartwatch IMUs and Egocentric Hand Pose for Seamless Surface Interactions

Key Points

Abstract

Related Articles

$500 GPU outperforms Claude Sonnet on coding benchmarks

[D] Real-time Student Attention Detection: ResNet vs Facial Landmarks - Which approach for resource-constrained deployment?

GLM-5.1 is live – coding ability on par with Claude Opus 4.5

Semantically Self-Aligned Network for Text-to-Image Part-aware PersonRe-identification

FlashAttention from first principles

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer