Fringe Projection Based Vision Pipeline for Autonomous Hard Drive Disassembly

arXiv cs.RO / 4/21/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper proposes an autonomous vision pipeline for robotic hard-drive disassembly that combines fringe projection 3D sensing with fast, real-time scene understanding and fastener/component localization.
  • It uses a fringe projection profilometry (FPP) module for 3D sensing, and conditionally triggers a depth-completion module when FPP fails, improving robustness across difficult sensing conditions.
  • By reusing the same camera–projector (FPP) hardware for both depth sensing and component localization, the system produces pixel-wise aligned depth/3D geometry and segmentation masks without additional registration.
  • The approach is optimized for deployment, reporting strong instance-segmentation performance (box mAP@50 0.960, mask mAP@50 0.957), accurate depth completion (RMSE 2.317 mm, MAE 1.836 mm), and real-time latency/throughput (12.86 ms, 77.7 FPS).
  • It applies sim-to-real transfer learning to expand the physical dataset and plans to publicly release a synthetic dataset for HDD instance segmentation.

Abstract

Unrecovered e-waste represents a significant economic loss. Hard disk drives (HDDs) comprise a valuable e-waste stream necessitating robotic disassembly. Automating the disassembly of HDDs requires holistic 3D sensing, scene understanding, and fastener localization, however current methods are fragmented, lack robust 3D sensing, and lack fastener localization. We propose an autonomous vision pipeline which performs 3D sensing using a Fringe Projection Profilometry (FPP) module, with selective triggering of a depth completion module where FPP fails, and integrates this module with a lightweight, real-time instance segmentation network for scene understanding and critical component localization. By utilizing the same FPP camera-projector system for both our depth sensing and component localization modules, our depth maps and derived 3D geometry are inherently pixel-wise aligned with the segmentation masks without registration, providing an advantage over RGB-D perception systems common in industrial sensing. We optimize both our trained depth completion and instance segmentation networks for deployment-oriented inference. The proposed system achieves a box mAP@50 of 0.960 and mask mAP@50 of 0.957 for instance segmentation, while the selected depth completion configuration with the Depth Anything V2 Base backbone achieves an RMSE of 2.317 mm and MAE of 1.836 mm; the Platter Facing learned inference stack achieved a combined latency of 12.86 ms and a throughput of 77.7 Frames Per Second (FPS) on the evaluation workstation. Finally, we adopt a sim-to-real transfer learning approach to augment our physical dataset. The proposed perception pipeline provides both high-fidelity semantic and spatial data which can be valuable for downstream robotic disassembly. The synthetic dataset developed for HDD instance segmentation will be made publicly available.