SkeletonContext: Skeleton-side Context Prompt Learning for Zero-Shot Skeleton-based Action Recognition
arXiv cs.CV / 4/1/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses zero-shot skeleton-based action recognition by tackling the semantic gap caused by missing contextual cues (e.g., objects) when aligning motion features with text embeddings.
- It proposes SkeletonContext, which adds language-driven context to skeleton representations using a Cross-Modal Context Prompt Module that reconstructs masked contextual prompts via a pretrained language model guided by LLM-derived signals.
- The method includes a Key-Part Decoupling Module to separate motion-relevant joints, improving robustness even when explicit object interactions are not present.
- Experiments on multiple benchmarks show state-of-the-art results in both conventional and generalized zero-shot settings, particularly for fine-grained actions that look visually similar.
- Overall, the approach demonstrates improved instance-level semantic grounding and cross-modal alignment by transferring contextual semantics from language into the skeleton encoder.
Related Articles

Black Hat Asia
AI Business

Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs
Dev.to

I Built an AI Agent That Can Write Its Own Tools When It Gets Stuck
Dev.to

How to Create AI Videos in 20 Minutes (3 Free Tools, Zero Experience)
Dev.to

Agent Self-Discovery: How AI Agents Find Their Own Wallets
Dev.to