AnySlot: Goal-Conditioned Vision-Language-Action Policies for Zero-Shot Slot-Level Placement
arXiv cs.RO / 4/14/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- AnySlot is a goal-conditioned vision-language-action framework that improves zero-shot slot-level placement by inserting an explicit spatial visual goal between language grounding and robot control.
- The method converts language instructions into a scene marker visual goal, then uses a goal-conditioned VLA policy to achieve more reliable semantic slot selection and spatial robustness.
- The paper addresses the absence of suitable evaluation data by introducing SlotBench, a simulation benchmark with nine task categories focused on structured spatial reasoning for slot-level placement.
- Experiments report that AnySlot outperforms flat VLA baselines and prior modular grounding approaches for sub-centimeter, precision-demanding placement tasks.
- Overall, the work proposes a hierarchical decoupling of high-level slot selection from low-level execution to reduce compositional complexity in robotic manipulation instructions.

