CoRAL: Contact-Rich Adaptive LLM-based Control for Robotic Manipulation

arXiv cs.RO / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • CoRAL (Contact-Rich Adaptive LLM-based control) is a modular framework that applies LLMs to contact-rich robotic manipulation by separating high-level reasoning from low-level adaptive control.
  • Instead of using an LLM as a black-box controller, CoRAL uses the LLM to design context-aware cost functions for a sampling-based motion planner (MPPI), enabling zero-shot planning.
  • The system adds a neuro-symbolic adaptation loop where a VLM supplies semantic priors (e.g., mass and friction) and online system identification refines physical parameters in real time based on interaction feedback.
  • CoRAL also includes a retrieval-based memory unit to reuse previously successful strategies across repeated or related tasks, improving performance under recurring contact scenarios.
  • In simulation and real-world hardware tests, CoRAL beats state-of-the-art VLA and foundation-model planners, achieving over 50% average success rates in unseen contact-rich tasks and handling sim-to-real through its adaptive physical understanding.

Abstract

While Large Language Models (LLMs) and Vision-Language Models (VLMs) demonstrate remarkable capabilities in high-level reasoning and semantic understanding, applying them directly to contact-rich manipulation remains a challenge due to their lack of explicit physical grounding and inability to perform adaptive control. To bridge this gap, we propose CoRAL (Contact-Rich Adaptive LLM-based control), a modular framework that enables zero-shot planning by decoupling high-level reasoning from low-level control. Unlike black-box policies, CoRAL uses LLMs not as direct controllers, but as cost designers that synthesize context-aware objective functions for a sampling-based motion planner (MPPI). To address the ambiguity of physical parameters in visual data, we introduce a neuro-symbolic adaptation loop: a VLM provides semantic priors for environmental dynamics, such as mass and friction estimates, which are then explicitly refined in real time via online system identification, while the LLM iteratively modulates the cost-function structure to correct strategic errors based on interaction feedback. Furthermore, a retrieval-based memory unit allows the system to reuse successful strategies across recurrent tasks. This hierarchical architecture ensures real-time control stability by decoupling high-level semantic reasoning from reactive execution, effectively bridging the gap between slow LLM inference and dynamic contact requirements. We validate CoRAL on both simulation and real-world hardware across challenging and novel tasks, such as flipping objects against walls by leveraging extrinsic contacts. Experiments demonstrate that CoRAL outperforms state-of-the-art VLA and foundation-model-based planner baselines by boosting success rates over 50% on average in unseen contact-rich scenarios, effectively handling sim-to-real gaps through its adaptive physical understanding.