Frequency-Enhanced Diffusion Models: Curriculum-Guided Semantic Alignment for Zero-Shot Skeleton Action Recognition
arXiv cs.CV / 4/13/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper targets Zero-Shot Skeleton Action Recognition (ZSAR), aiming to generalize to unseen actions without exhaustive skeleton annotations by aligning motion semantics between skeleton signals and text prompts.
- It proposes Frequency-Aware Diffusion for Skeleton-Text Matching (FDSM), addressing diffusion models’ spectral bias (oversmoothing high-frequency dynamics) with frequency-aware training and architectural additions.
- FDSM integrates a Semantic-Guided Spectral Residual Module, a Timestep-Adaptive Spectral Loss, and curriculum-based semantic abstraction to better recover fine-grained motion details.
- The method reports state-of-the-art performance on multiple skeleton action datasets, including NTU RGB+D, PKU-MMD, and Kinetics-skeleton.
- The authors release code and a project homepage, enabling replication and further experimentation by the community.
Related Articles

Black Hat Asia
AI Business

Apple is building smart glasses without a display to serve as an AI wearable
THE DECODER

Why Fashion Trend Prediction Isn’t Enough Without Generative AI
Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Chatbot vs Voicebot: The Real Business Decision Nobody Talks About
Dev.to