HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation

arXiv cs.RO / 4/10/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

HEX introduces a state-centric framework for stable whole-body manipulation on full-sized bipedal humanoid robots, addressing the instability that arises when VLA models treat body parts independently.
The approach uses a humanoid-aligned universal state representation to enable scalable learning across heterogeneous robot embodiments and incorporates a Mixture-of-Experts proprioceptive predictor for coordinated motion modeling.
HEX leverages lightweight history tokens to retain temporal visual context efficiently, reducing the need to repeatedly encode past images during inference.
A residual-gated fusion mechanism combined with a flow-matching action head integrates visual-language cues with proprioceptive dynamics to generate actions.
Real-world humanoid manipulation experiments report state-of-the-art task success and improved generalization, especially for fast-reaction and long-horizon tasks.

Abstract

Humans achieve complex manipulation through coordinated whole-body control, whereas most Vision-Language-Action (VLA) models treat robot body parts largely independently, making high-DoF humanoid control challenging and often unstable. We present HEX, a state-centric framework for coordinated manipulation on full-sized bipedal humanoid robots. HEX introduces a humanoid-aligned universal state representation for scalable learning across heterogeneous embodiments, and incorporates a Mixture-of-Experts Unified Proprioceptive Predictor to model whole-body coordination and temporal motion dynamics from large-scale multi-embodiment trajectory data. To efficiently capture temporal visual context, HEX uses lightweight history tokens to summarize past observations, avoiding repeated encoding of historical images during inference. It further employs a residual-gated fusion mechanism with a flow-matching action head to adaptively integrate visual-language cues with proprioceptive dynamics for action generation. Experiments on real-world humanoid manipulation tasks show that HEX achieves state-of-the-art performance in task success rate and generalization, particularly in fast-reaction and long-horizon scenarios.

Black Hat Asia

AI Business

v0.20.5

Ollama Releases

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS

Dev.to

Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.

Reddit r/LocalLLaMA

SoloEngine: Low-Code Agentic AI Development Platform with Native Support for Multi-Agent Collaboration, MCP, and Skill System

Dev.to

HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation

Key Points

Abstract

Related Articles

Black Hat Asia

v0.20.5

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS

Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.

SoloEngine: Low-Code Agentic AI Development Platform with Native Support for Multi-Agent Collaboration, MCP, and Skill System

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer