RopeDreamer: A Kinematic Recurrent State Space Model for Dynamics of Flexible Deformable Linear Objects

arXiv cs.RO / 5/1/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper presents RopeDreamer, a latent dynamics model for predicting the behavior of deformable linear objects (DLOs) under complex, contact-rich robotic manipulation.
  • It combines a Recurrent State Space Model with a quaternion-based kinematic chain representation to enforce physical validity, including constant link lengths and manifold-constrained motion.
  • A dual-decoder design separates state reconstruction from future-state prediction, encouraging the latent space to learn deformation physics rather than purely fitting observations.
  • Experiments on large-scale simulated pick-and-place trajectories with self-intersections show a 40.52% reduction in open-loop prediction error over 50-step horizons versus a strong baseline.
  • The approach also cuts inference time by 31.17% and maintains better topological consistency across multiple crossings, supporting long-horizon manipulation planning.

Abstract

The robotic manipulation of Deformable Linear Objects (DLOs) is a fundamental challenge due to the high-dimensional, non-linear dynamics of flexible structures and the complexity of maintaining topological integrity during contact-rich tasks. While recent data-driven methods have utilized Recurrent and Graph Neural Networks for dynamics modeling, they often struggle with self-intersections and non-physical deformations, such as tangling and link stretching. In this paper, we propose a latent dynamics framework that combines a Recurrent State Space Model with a Quaternionic Kinematic Chain representation to enable robust, long-term forecasting of DLO states. By encoding the DLO as a sequence of relative rotations (quaternions) rather than independent Cartesian positions, we inherently constrain the model to a physically valid manifold that preserves link-length constancy. Furthermore, we introduce a dual-decoder architecture that decouples state reconstruction from future-state prediction, forcing the latent space to capture the underlying physics of deformation. We evaluate our approach on a large-scale simulated dataset of complex pick-and-place trajectories involving self-intersections. Our results demonstrate that the proposed model achieves a 40.52% reduction in open-loop prediction error over 50-step horizons compared to the state-of-the-art baseline, while reducing inference time by 31.17%. Our model further maintains superior topological consistency in scenarios with multiple crossings, proving its efficacy as a compositional primitive for long-horizon manipulation planning.