Sim-to-Real Transfer and Robustness Evaluation of Reinforcement Learning Control with Integrated Perception on an ASV for Floating Waste Capture

arXiv cs.RO / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The study presents a field-validated autonomous surface vessel system that uses camera-based polarimetric perception together with a lightweight DRL controller to detect and capture floating waste under real-world hydrodynamics and perception challenges.
  • A key contribution is a sim-to-real evaluation methodology that uses a two-stage simulation protocol and a perception abstraction module to better mimic real camera behavior, making the sim-to-real gap measurable and reproducible.
  • The framework is tested in both matched simulations and field experiments across 14 disturbance regimes, showing centimeter-level terminal accuracy and generally robust control performance.
  • The authors identify insufficient actuation-model fidelity as the primary driver of performance degradation, and they outline practical transfer improvements such as higher-fidelity actuation modeling, targeted domain randomization, and tighter latency/timestamp handling across system modules.
  • The system is also demonstrated in a real search-and-capture task using real camera detections over areas up to 450 m², indicating operational viability beyond purely simulated conditions.

Abstract

Autonomous surface vessels for floating-waste removal operate under varying hydrodynamics, external disturbances, and challenging water-surface perception. We present a field-validated system that combines camera-based polarimetric perception with a lightweight DRL-based controller for floating-waste detection and capture. Camera detections are converted into water-surface target points and tracked by a controller trained entirely in simulation and deployed directly on a retrofitted ASV platform. Our main contribution is a sim-to-real testing methodology that combines a two-stage simulation protocol with a perception abstraction module designed to mimic real camera behavior, enabling reproducible field trials and explicit evaluation of the sim-to-real gap. We apply this framework in matched simulation and field experiments across 14 disturbance regimes to expose failure modes and evaluate robustness. The results show centimeter-level terminal accuracy and indicate robust control performance under the evaluated perturbation regimes. The main source of degradation is insufficient actuation-model fidelity. We also demonstrate the system in a search-and-capture application using real camera detections in real-world conditions over areas of up to 450~m^2. The study distills practical lessons for reliable transfer, including improved actuation-model fidelity, targeted domain randomization, and careful management of latency and timestamps across modules, while highlighting remaining challenges.