MotionBricks: Scalable Real-Time Motions with Modular Latent Generative Model and Smart Primitives

arXiv cs.LG / 4/29/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • MotionBricks targets two gaps in generative motion synthesis—real-time scalability (maintaining quality and scale under tight compute budgets) and integration for fine-grained multimodal control beyond text/tag-driven models.
  • The system uses a large-scale modular latent generative backbone that is trained to model a dataset of 350,000+ motion clips with a single model, aiming for robust real-time generation.
  • It adds “smart primitives” as an intuitive, unified interface for authoring navigation and object interactions, enabling plug-and-play assembly of motion behaviors without expert animation knowledge.
  • The authors report state-of-the-art motion quality across open-source and proprietary datasets, along with real-time performance of 15,000 FPS at 2ms latency in quantitative tests.
  • They validate the framework with a production-level animation demo and extend it to robotics by deploying it on the Unitree G1 humanoid robot for real-time control and generalization.

Abstract

Despite transformative advances in generative motion synthesis, real-time interactive motion control remains dominated by traditional techniques. In this work, we identify two key challenges in bridging research and production: 1) Real-time scalability: Industry applications demand real-time generation of a vast repertoire of motion skills, while generative methods exhibit significant degradation in quality and scalability under real-time computation constraints, and 2) Integration: Industry applications demand fine-grained multi-modal control involving velocity commands, style selection, and precise keyframes, a need largely unmet by existing text- or tag-driven models. To overcome these limitations, we introduce MotionBricks: a large-scale, real-time generative framework with a two-fold solution. First, we propose a large-scale modular latent generative backbone tailored for robust real-time motion generation, effectively modeling a dataset of over 350,000 motion clips with a single model. Second, we introduce smart primitives that provide a unified, robust, and intuitive interface for authoring both navigation and object interaction. Applications can be designed in a plug-and-play manner like assembling bricks without expert animation knowledge. Quantitatively, we show that MotionBricks produces state-of-the-art motion quality on open-source and proprietary datasets of various scales, while also achieving a real-time throughput of 15,000 FPS with 2ms latency. We demonstrate the flexibility and robustness of MotionBricks in a complete production-level animation demo, covering navigation and object-scene interaction across various styles with a unified model. To showcase our framework's application beyond animation, we deploy MotionBricks on the Unitree G1 humanoid robot to demonstrate its flexibility and generalization for real-time robotic control.