Metric, inertially aligned monocular state estimation via kinetodynamic priors

arXiv cs.RO / 4/29/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes an inertially aligned monocular state estimation method for robots with dynamically deforming (non-rigid) structures that break rigid-body assumptions.
  • It combines a learned deformation-force model (implemented with a Multi-Layer Perceptron) with a continuous-time kinematic model using B-splines to represent smooth platform motion.
  • By continuously enforcing Newton’s Second Law, the approach links vision-derived trajectory acceleration to deformation-induced acceleration, improving state estimation consistency.
  • The authors show that accurately modeled platform physics can enable recovery of inertial sensing properties, and they validate this on a spring-camera setup.
  • The experiments demonstrate improved robustness for typically ill-posed monocular visual odometry tasks such as metric scale and gravity recovery.

Abstract

Accurate state estimation for flexible robotic systems poses significant challenges, particularly for platforms with dynamically deforming structures that invalidate rigid-body assumptions. This paper addresses this problem and enables the extension of existing rigid-body pose estimation methods to non-rigid systems. Our approach integrates two core components: first, we capture elastic properties using a deformation-force model, efficiently learned via a Multi-Layer Perceptron; second, we resolve the platform's inherently smooth motion using continuous-time B-spline kinematic models. By continuously applying Newton's Second Law, our method formulates the relationship between visually-derived trajectory acceleration and predicted deformation-induced acceleration. We demonstrate that our approach not only enables robust and accurate pose estimation on non-rigid platforms, but also shows that the properly modeled platform physics allow for the recovery of inertial sensing properties. We validate this feasibility on a simple spring-camera system, showing how it robustly resolves the typically ill-posed problem of metric scale and gravity recovery in monocular visual odometry.