ECHO: Edge-Cloud Humanoid Orchestration for Language-to-Motion Control

arXiv cs.CV / 3/18/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

It presents ECHO, an edge-cloud framework for language-driven whole-body control of humanoid robots, linking a cloud diffusion-based text-to-motion generator with an edge RL tracker in closed loop.
The motion is encoded in a compact 38-dimensional representation and generated by a 1D UNet with cross-attention on CLIP features, enabling rapid inference (about one second on cloud GPU with 10 denoising steps).
The tracker uses a Teacher-Student paradigm with sim-to-real transfer via an evidential adaptation module, domain randomization, and symmetry constraints, plus an autonomous fall recovery mechanism using onboard IMU and library trajectories.
Evaluations on HumanML3D show strong generation quality (FID 0.029, R-Precision Top-1 0.686), while real-world tests on a Unitree G1 demonstrate stable command execution without hardware fine-tuning.

Abstract

We present ECHO, an edge--cloud framework for language-driven whole-body control of humanoid robots. A cloud-hosted diffusion-based text-to-motion generator synthesizes motion references from natural language instructions, while an edge-deployed reinforcement-learning tracker executes them in closed loop on the robot. The two modules are bridged by a compact, robot-native 38-dimensional motion representation that encodes joint angles, root planar velocity, root height, and a continuous 6D root orientation per frame, eliminating inference-time retargeting from human body models and remaining directly compatible with low-level PD control. The generator adopts a 1D convolutional UNet with cross-attention conditioned on CLIP-encoded text features; at inference, DDIM sampling with 10 denoising steps and classifier-free guidance produces motion sequences in approximately one second on a cloud GPU. The tracker follows a Teacher--Student paradigm: a privileged teacher policy is distilled into a lightweight student equipped with an evidential adaptation module for sim-to-real transfer, further strengthened by morphological symmetry constraints and domain randomization. An autonomous fall recovery mechanism detects falls via onboard IMU readings and retrieves recovery trajectories from a pre-built motion library. We evaluate ECHO on a retargeted HumanML3D benchmark, where it achieves strong generation quality (FID 0.029, R-Precision Top-1 0.686) under a unified robot-domain evaluator, while maintaining high motion safety and trajectory consistency. Real-world experiments on a Unitree G1 humanoid demonstrate stable execution of diverse text commands with zero hardware fine-tuning.

I let an AI agent loose on my codebase. It tried to read my .env file in 30 seconds.

Dev.to

How I Taught an AI Agent to Save Its Own Progress

Dev.to

Chip Smuggling Arrests, OpenClaw Is 'The Next ChatGPT,' and 81K People on AI

Dev.to

The Lemma

Dev.to

Your Agent Will Eventually Do Something Catastrophic. Here's How to Prevent It.

Dev.to

ECHO: Edge-Cloud Humanoid Orchestration for Language-to-Motion Control

Key Points

Abstract

Related Articles

I let an AI agent loose on my codebase. It tried to read my .env file in 30 seconds.

How I Taught an AI Agent to Save Its Own Progress

Chip Smuggling Arrests, OpenClaw Is 'The Next ChatGPT,' and 81K People on AI

The Lemma

Your Agent Will Eventually Do Something Catastrophic. Here's How to Prevent It.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer