AnyUser: Translating Sketched User Intent into Domestic Robots
arXiv cs.RO / 4/7/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- AnyUser is proposed as a unified multimodal robotic instruction system that converts free-form sketches (optionally with language) on camera images into executable actions for domestic tasks.
- The approach uses spatial-semantic primitives, multimodal fusion for understanding sketch/vision/language inputs, and a hierarchical policy to generate robust action sequences without relying on prior maps or pre-trained models.
- Evaluations include quantitative benchmarks on a large-scale dataset for accurate interpretation of sketch-based commands across varied simulated home scenes.
- Real-world tests on two robot platforms—a stationary 7-DoF assistive arm (KUKA LBR iiwa) and a dual-arm mobile manipulator (Realman RMC-AIDAL)—demonstrate reliable grounding and execution for tasks such as targeted wiping and area cleaning.
- A user study with diverse demographics (including elderly and people with low technical literacy) shows improved usability and task specification efficiency, with high completion rates (85.7%–96.4%) and strong user satisfaction.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat Asia
AI Business
[R] The ECIH: Model Modeling Agentic Identity as an Emergent Relational State [R]
Reddit r/MachineLearning
Google DeepMind Unveils Project Genie: The Dawn of Infinite AI-Generated Game Worlds
Dev.to
Artificial Intelligence and Life in 2030: The One Hundred Year Study onArtificial Intelligence
Dev.to
Stop waiting for Java to rebuild! AI IDEs + Zero-Latency Hot Reload = Magic
Dev.to