Leveraging Synthetic Data for Enhancing Egocentric Hand-Object Interaction Detection

arXiv cs.CV / 4/1/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper studies how synthetic data can improve egocentric Hand-Object Interaction (HOI) detection, especially when labeled real data are scarce or missing.
Experiments across VISOR, EgoHOS, and ENIGMA-51 show that training with synthetic data plus only 10% of real labeled data increases Overall AP versus training on real data alone.
Reported gains are +5.67% on VISOR, +8.24% on EgoHOS, and +11.69% on ENIGMA-51, supporting synthetic data as a practical performance booster.
The authors find that synthetic-real alignment (objects, grasps, and environments) is a key factor, with effectiveness improving as alignment to real-world benchmarks improves.
They release a new synthetic data generation pipeline and the HOI-Synth benchmark, providing automatically annotated synthetic images with contact states, bounding boxes, and pixel-wise segmentation masks.

Abstract

In this work, we explore the role of synthetic data in improving the detection of Hand-Object Interactions from egocentric images. Through extensive experimentation and comparative analysis on VISOR, EgoHOS, and ENIGMA-51 datasets, our findings demonstrate the potential of synthetic data to significantly improve HOI detection, particularly when real labeled data are scarce or unavailable. By using synthetic data and only 10% of the real labeled data, we achieve improvements in Overall AP over models trained exclusively on real data, with gains of +5.67% on VISOR, +8.24% on EgoHOS, and +11.69% on ENIGMA-51. Furthermore, we systematically study how aligning synthetic data to specific real-world benchmarks with respect to objects, grasps, and environments, showing that the effectiveness of synthetic data consistently improves with better synthetic-real alignment. As a result of this work, we release a new data generation pipeline and the new HOI-Synth benchmark, which augments existing datasets with synthetic images of hand-object interaction. These data are automatically annotated with hand-object contact states, bounding boxes, and pixel-wise segmentation masks. All data, code, and tools for synthetic data generation are available at: https://fpv-iplab.github.io/HOI-Synth/.

Black Hat USA

AI Business

Black Hat Asia

AI Business

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Dev.to

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

Dev.to

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

MarkTechPost

Leveraging Synthetic Data for Enhancing Egocentric Hand-Object Interaction Detection

Key Points

Abstract

Related Articles

Black Hat USA

Black Hat Asia

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer