CART: Context-Aware Terrain Adaptation using Temporal Sequence Selection for Legged Robots

arXiv cs.RO / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper introduces CART, a context-aware terrain adaptation controller that fuses proprioception (internal sensing) with exteroception (e.g., vision) to better understand uneven terrain for legged robots.
  • It argues that many existing experience-driven methods can fail in complex off-road settings due to reliance on vision observations, leading to a “Visual-Texture Paradox” between what the robot sees and what it actually feels.
  • CART is evaluated on multiple terrains using ANYmal-C in IsaacSim simulation and Boston Dynamics SPOT on real hardware, with vibrational base stability used as a metric for learned contextual terrain properties.
  • Compared with state-of-the-art multimodal baselines, CART delivers a 5% average success-rate improvement in simulation and boosts real-world stability by up to 45% (and 24% overall) without increasing locomotion task time.

Abstract

Animals in nature combine multiple modalities, such as sight and feel, to perceive terrain and develop an understanding of how to walk on uneven terrain in a stable manner. Similarly, legged robots need to develop their ability to stably walk on complex terrains by developing an understanding of the relationship between vision and proprioception. Most current terrain adaptation methods are susceptible to failure on complex, off-road terrain as they rely on prior experience, particularly observations from a vision sensor. This experience-based learning often creates a Visual-Texture Paradox between what has been seen and how it actually feels. In this work, we introduce CART, a high-level controller built on a context-aware terrain adaptation approach that integrates proprioception and exteroception from onboard sensing to achieve a robust understanding of terrain. We evaluate our method on multiple terrains using an ANYmal-C robot on the IsaacSim simulator and a Boston Dynamics SPOT robot for our real-world experiments. To evaluate the learned contextual terrain properties, we adapt vibrational stability on the base of the robot as a metric. We compare CART with various state-of-the-art baselines equipped with multimodal sensing in both simulation and the real world. CART achieves an average success rate improvement of 5% over all baselines in simulation and improves the overall stability up to 45% and 24% in the real world without increasing the time taken by the robot to accomplish locomotion tasks.