Viewpoint-Agnostic Grasp Pipeline using VLM and Partial Observations
arXiv cs.RO / 5/6/2026
💬 OpinionDeveloper Stack & InfrastructureModels & Research
Key Points
- The paper introduces an end-to-end, language-guided grasping pipeline for mobile legged manipulators operating in cluttered scenes where occlusions cause partial observations and unreliable depth.
- It links open-vocabulary target selection from a natural-language command to safe real-robot grasp execution by using RGB grounding (open-vocabulary detection and promptable instance segmentation) plus object-centric point-cloud extraction from RGB-D.
- To handle occlusion-related geometric failures, the method applies back-projected depth compensation and a two-stage point-cloud completion process before generating grasp candidates.
- It then produces and filters 6-DoF grasp candidates with collision checking and safety-oriented heuristics focused on reachability, approach feasibility, and clearance.
- Experiments on a quadruped robot with an arm in two cluttered tabletop setups show 90% overall success (9/10) versus 30% (3/10) for a view-dependent baseline, highlighting robustness to partial observations.
Related Articles

SIFS (SIFS Is Fast Search) - local code search for coding agents
Dev.to

BizNode's semantic memory (Qdrant) makes your bot smarter over time — it remembers past conversations and answers...
Dev.to

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss
MarkTechPost
Solidity LM surpasses Opus
Reddit r/LocalLLaMA

Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...)
Reddit r/LocalLLaMA