Towards Universal Skeleton-Based Action Recognition
arXiv cs.CV / 4/21/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper targets “universal” skeleton-based action recognition in real-world robotics, where skeleton data can be heterogeneous due to different human and humanoid robot sources.
- It introduces the Heterogeneous Open-Vocabulary (HOV) Skeleton dataset by integrating and refining multiple large-scale skeleton action datasets to support open-vocabulary settings.
- The authors propose a Transformer-based framework with unified skeleton representation, a motion encoder for skeletons, and multi-grained motion–text alignment.
- The approach uses multi-level contrastive learning (global, stream-specific, and fine-grained) to align learned motion representations with text embeddings.
- Experiments on common benchmarks with heterogeneous skeletons show improved effectiveness and generalization, and the code is released on GitHub.
Related Articles

Every time a new model comes out, the old one is obsolete of course
Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆
Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)
Dev.to

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims
Dev.to

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM
Reddit r/LocalLLaMA