Build on Priors: Vision--Language--Guided Neuro-Symbolic Imitation Learning for Data-Efficient Real-World Robot Manipulation
arXiv cs.RO / 4/7/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper tackles data-efficient long-horizon robot manipulation by proposing an automated neuro-symbolic imitation learning pipeline that works from as few as one to thirty unannotated skill demonstrations.
- It segments demonstrations into skills, then uses a vision-language model (VLM) to classify skills and discover equivalent high-level states, forming an automatically built state-transition graph.
- An Answer Set Programming solver converts this graph into a synthesized PDDL planning domain, which is further used to isolate minimal, task-relevant observation/action spaces for each skill policy.
- Unlike end-to-end raw actuator imitation, the method learns at a control-reference level to produce smoother targets and reduce noisy learning signals.
- The approach validates on an industrial forklift with statistically rigorous trials and shows cross-platform generality on a Kinova Gen3 arm, highlighting scalability, expert-free setup, and interpretability.
Related Articles

Black Hat Asia
AI Business

Meta Superintelligence Lab Releases Muse Spark: A Multimodal Reasoning Model With Thought Compression and Parallel Agents
MarkTechPost

Chatbots are great at manipulating people to buy stuff, Princeton boffins find
The Register
I tested and ranked every ai companion app I tried and here's my honest breakdown
Reddit r/artificial

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to