AnyUser: Translating Sketched User Intent into Domestic Robots

arXiv cs.RO / 4/7/2026

📰 NewsSignals & Early TrendsModels & Research

共有:

Key Points

AnyUser is proposed as a unified multimodal robotic instruction system that converts free-form sketches (optionally with language) on camera images into executable actions for domestic tasks.
The approach uses spatial-semantic primitives, multimodal fusion for understanding sketch/vision/language inputs, and a hierarchical policy to generate robust action sequences without relying on prior maps or pre-trained models.
Evaluations include quantitative benchmarks on a large-scale dataset for accurate interpretation of sketch-based commands across varied simulated home scenes.
Real-world tests on two robot platforms—a stationary 7-DoF assistive arm (KUKA LBR iiwa) and a dual-arm mobile manipulator (Realman RMC-AIDAL)—demonstrate reliable grounding and execution for tasks such as targeted wiping and area cleaning.
A user study with diverse demographics (including elderly and people with low technical literacy) shows improved usability and task specification efficiency, with high completion rates (85.7%–96.4%) and strong user satisfaction.

Abstract

We introduce AnyUser, a unified robotic instruction system for intuitive domestic task instruction via free-form sketches on camera images, optionally with language. AnyUser interprets multimodal inputs (sketch, vision, language) as spatial-semantic primitives to generate executable robot actions requiring no prior maps or models. Novel components include multimodal fusion for understanding and a hierarchical policy for robust action generation. Efficacy is shown via extensive evaluations: (1) Quantitative benchmarks on the large-scale dataset showing high accuracy in interpreting diverse sketch-based commands across various simulated domestic scenes. (2) Real-world validation on two distinct robotic platforms, a statically mounted 7-DoF assistive arm (KUKA LBR iiwa) and a dual-arm mobile manipulator (Realman RMC-AIDAL), performing representative tasks like targeted wiping and area cleaning, confirming the system's ability to ground instructions and execute them reliably in physical environments. (3) A comprehensive user study involving diverse demographics (elderly, simulated non-verbal, low technical literacy) demonstrating significant improvements in usability and task specification efficiency, achieving high task completion rates (85.7%-96.4%) and user satisfaction. AnyUser bridges the gap between advanced robotic capabilities and the need for accessible non-expert interaction, laying the foundation for practical assistive robots adaptable to real-world human environments.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/7DailyView insight →

Black Hat Asia

AI Business

[R] The ECIH: Model Modeling Agentic Identity as an Emergent Relational State [R]

Reddit r/MachineLearning

Google DeepMind Unveils Project Genie: The Dawn of Infinite AI-Generated Game Worlds

Dev.to

Artificial Intelligence and Life in 2030: The One Hundred Year Study onArtificial Intelligence

Dev.to

Stop waiting for Java to rebuild! AI IDEs + Zero-Latency Hot Reload = Magic

Dev.to

AnyUser: Translating Sketched User Intent into Domestic Robots

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat Asia

[R] The ECIH: Model Modeling Agentic Identity as an Emergent Relational State [R]

Google DeepMind Unveils Project Genie: The Dawn of Infinite AI-Generated Game Worlds

Artificial Intelligence and Life in 2030: The One Hundred Year Study onArtificial Intelligence

Stop waiting for Java to rebuild! AI IDEs + Zero-Latency Hot Reload = Magic

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer