AURA: Multimodal Shared Autonomy for Real-World Urban Navigation

arXiv cs.RO / 4/3/2026

💬 OpinionSignals & Early TrendsModels & Research

共有:

Key Points

The paper introduces AURA, a new multimodal shared-autonomy framework for long-horizon urban navigation that splits tasks into high-level human instructions and low-level AI control.
AURA uses a Spatial-Aware Instruction Encoder to better align vision-and-spatial context with diverse vision-language human instructions.
It proposes MM-CoS, a large-scale training dataset that combines teleoperation data with vision-language descriptions to enable learning under realistic instruction scenarios.
Experiments in both simulation and real-world settings show improved navigation stability and instruction-following, alongside online adaptation capabilities.
Under comparable takeover conditions, the shared-autonomy approach reduces the frequency of human takeovers by more than 44%, suggesting a measurable reduction in operator burden and fatigue.

Abstract

Long-horizon navigation in complex urban environments relies heavily on continuous human operation, which leads to fatigue, reduced efficiency, and safety concerns. Shared autonomy, where a Vision-Language AI agent and a human operator collaborate on maneuvering the mobile machine, presents a promising solution to address these issues. However, existing shared autonomy methods often require humans and AI to operate within the same action space, leading to high cognitive overhead. We present Assistive Urban Robot Autonomy (AURA), a new multi-modal framework that decomposes urban navigation into high-level human instruction and low-level AI control. AURA incorporates a Spatial-Aware Instruction Encoder to align various human instructions with visual and spatial context. To facilitate training, we construct MM-CoS, a large-scale dataset comprising teleoperation and vision-language descriptions. Experiments in simulation and the real world demonstrate that AURA effectively follows human instructions, reduces manual operation effort, and improves navigation stability, while enabling online adaptation. Moreover, under similar takeover conditions, our shared autonomy framework reduces the frequency of takeovers by more than 44%. Demo video and more detail are provided in the project page.

Black Hat Asia

AI Business

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Dev.to

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

Dev.to

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

MarkTechPost

Portable eye scanner powered by AI expands access to low-cost community screening

Reddit r/artificial

AURA: Multimodal Shared Autonomy for Real-World Urban Navigation

Key Points

Abstract

Related Articles

Black Hat Asia

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

Portable eye scanner powered by AI expands access to low-cost community screening

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer