EgoXtreme: A Dataset for Robust Object Pose Estimation in Egocentric Views under Extreme Conditions

arXiv cs.CV / 3/27/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • EgoXtreme is a newly introduced, large-scale 6D object pose estimation dataset captured entirely from egocentric (smart-glass-like) views to better reflect real-world challenges missing from existing benchmarks.
  • The dataset includes three extreme scenarios—industrial maintenance, sports, and emergency rescue—designed to induce severe motion blur, dynamic lighting, obstructions, and smoke.
  • Experiments show that state-of-the-art pose estimators do not generalize well to EgoXtreme, with especially poor performance under low-light conditions.
  • The study finds that straightforward image restoration (e.g., deblurring) alone does not improve pose estimation in these extreme settings, while tracking-based methods benefit from temporal information.
  • The authors provide the dataset and code publicly, positioning EgoXtreme as a resource to develop next-generation robust egocentric pose estimation models.

Abstract

Smart glass is emerging as an useful device since it provides plenty of insights under hands-busy, eyes-on-task situations. To understand the context of the wearer, 6D object pose estimation in egocentric view is becoming essential. However, existing 6D object pose estimation benchmarks fail to capture the challenges of real-world egocentric applications, which are often dominated by severe motion blur, dynamic illumination, and visual obstructions. This discrepancy creates a significant gap between controlled lab data and chaotic real-world application. To bridge this gap, we introduce EgoXtreme, a new large-scale 6D pose estimation dataset captured entirely from an egocentric perspective. EgoXtreme features three challenging scenarios - industrial maintenance, sports, and emergency rescue - designed to introduce severe perceptual ambiguities through extreme lighting, heavy motion blur, and smoke. Evaluations of state-of-the-art generalizable pose estimators on EgoXtreme indicate that their generalization fails to hold in extreme conditions, especially under low light. We further demonstrate that simply applying image restoration (e.g., deblurring) offers no positive improvement for extreme conditions. While performance gain has appeared in tracking-based approach, implying using temporal information in fast-motion scenarios is meaningful. We conclude that EgoXtreme is an essential resource for developing and evaluating the next generation of pose estimation models robust enough for real-world egocentric vision. The dataset and code are available at https://taegyoun88.github.io/EgoXtreme/
広告