AnySlot: Goal-Conditioned Vision-Language-Action Policies for Zero-Shot Slot-Level Placement

arXiv cs.RO / 4/14/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

AnySlot is a goal-conditioned vision-language-action framework that improves zero-shot slot-level placement by inserting an explicit spatial visual goal between language grounding and robot control.
The method converts language instructions into a scene marker visual goal, then uses a goal-conditioned VLA policy to achieve more reliable semantic slot selection and spatial robustness.
The paper addresses the absence of suitable evaluation data by introducing SlotBench, a simulation benchmark with nine task categories focused on structured spatial reasoning for slot-level placement.
Experiments report that AnySlot outperforms flat VLA baselines and prior modular grounding approaches for sub-centimeter, precision-demanding placement tasks.
Overall, the work proposes a hierarchical decoupling of high-level slot selection from low-level execution to reduce compositional complexity in robotic manipulation instructions.

Abstract

Vision-Language-Action (VLA) policies have emerged as a versatile paradigm for generalist robotic manipulation. However, precise object placement under compositional language instructions remains a major challenge for modern monolithic VLA policies. Slot-level tasks require both reliable slot grounding and sub-centimeter execution accuracy. To this end, we propose AnySlot, a framework that reduces compositional complexity by introducing an explicit spatial visual goal as an intermediate representation between language grounding and control. AnySlot turns language into an explicit visual goal by generating a scene marker, then executes this goal with a goal-conditioned VLA policy. This hierarchical design effectively decouples high-level slot selection from low-level execution, ensuring both semantic accuracy and spatial robustness. Furthermore, recognizing the lack of existing benchmarks for such precision-demanding tasks, we introduce SlotBench, a comprehensive simulation benchmark featuring nine task categories tailored to evaluate structured spatial reasoning in slot-level placement. Extensive experiments show that AnySlot significantly outperforms flat VLA baselines and previous modular grounding methods in zero-shot slot-level placement.

Black Hat Asia

AI Business

What Most Beginners Get Wrong About Building AI Apps

Dev.to

AI Is Replacing Freshers? The Harsh Truth No One Is Telling You (Read Before It’s Too Late)

Dev.to

How AI is changing cybersecurity

Dev.to

Evaluating LLMs for Code Generation: Accuracy, Latency, and Failure Modes

Dev.to

AnySlot: Goal-Conditioned Vision-Language-Action Policies for Zero-Shot Slot-Level Placement

Key Points

Abstract

Related Articles

Black Hat Asia

What Most Beginners Get Wrong About Building AI Apps

AI Is Replacing Freshers? The Harsh Truth No One Is Telling You (Read Before It’s Too Late)

How AI is changing cybersecurity

Evaluating LLMs for Code Generation: Accuracy, Latency, and Failure Modes

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer