AI Navigate

SAVA-X: Ego-to-Exo Imitation Error Detection via Scene-Adaptive View Alignment and Bidirectional Cross View Fusion

arXiv cs.CV / 3/16/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper formalizes Ego→Exo Imitation Error Detection, requiring localizing steps on the ego timeline from asynchronous and length-mismatched ego and exo videos.
  • It identifies cross-view domain shift, temporal misalignment, and redundancy as core challenges that hinder baseline methods adapted from dense video captioning and temporal action detection.
  • The authors introduce SAVA-X, an Align-Fuse-Detect framework featuring view-conditioned adaptive sampling, scene-adaptive view embeddings, and bidirectional cross-attention fusion to address these challenges.
  • On the EgoMe benchmark, SAVA-X consistently improves AUPRC and mean tIoU over baselines, and the code is released on GitHub for replication.

Abstract

Error detection is crucial in industrial training, healthcare, and assembly quality control. Most existing work assumes a single-view setting and cannot handle the practical case where a third-person (exo) demonstration is used to assess a first-person (ego) imitation. We formalize Ego\rightarrowExo Imitation Error Detection: given asynchronous, length-mismatched ego and exo videos, the model must localize procedural steps on the ego timeline and decide whether each is erroneous. This setting introduces cross-view domain shift, temporal misalignment, and heavy redundancy. Under a unified protocol, we adapt strong baselines from dense video captioning and temporal action detection and show that they struggle in this cross-view regime. We then propose SAVA-X, an Align-Fuse-Detect framework with (i) view-conditioned adaptive sampling, (ii) scene-adaptive view embeddings, and (iii) bidirectional cross-attention fusion. On the EgoMe benchmark, SAVA-X consistently improves AUPRC and mean tIoU over all baselines, and ablations confirm the complementary benefits of its components. Code is available at https://github.com/jack1ee/SAVAX.