ECHO: Edge-Cloud Humanoid Orchestration for Language-to-Motion Control
arXiv cs.CV / 3/18/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- It presents ECHO, an edge-cloud framework for language-driven whole-body control of humanoid robots, linking a cloud diffusion-based text-to-motion generator with an edge RL tracker in closed loop.
- The motion is encoded in a compact 38-dimensional representation and generated by a 1D UNet with cross-attention on CLIP features, enabling rapid inference (about one second on cloud GPU with 10 denoising steps).
- The tracker uses a Teacher-Student paradigm with sim-to-real transfer via an evidential adaptation module, domain randomization, and symmetry constraints, plus an autonomous fall recovery mechanism using onboard IMU and library trajectories.
- Evaluations on HumanML3D show strong generation quality (FID 0.029, R-Precision Top-1 0.686), while real-world tests on a Unitree G1 demonstrate stable command execution without hardware fine-tuning.
Related Articles

I let an AI agent loose on my codebase. It tried to read my .env file in 30 seconds.
Dev.to
How I Taught an AI Agent to Save Its Own Progress
Dev.to

Chip Smuggling Arrests, OpenClaw Is 'The Next ChatGPT,' and 81K People on AI
Dev.to
The Lemma
Dev.to
Your Agent Will Eventually Do Something Catastrophic. Here's How to Prevent It.
Dev.to