Frequency-Enhanced Diffusion Models: Curriculum-Guided Semantic Alignment for Zero-Shot Skeleton Action Recognition

arXiv cs.CV / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper targets Zero-Shot Skeleton Action Recognition (ZSAR), aiming to generalize to unseen actions without exhaustive skeleton annotations by aligning motion semantics between skeleton signals and text prompts.
It proposes Frequency-Aware Diffusion for Skeleton-Text Matching (FDSM), addressing diffusion models’ spectral bias (oversmoothing high-frequency dynamics) with frequency-aware training and architectural additions.
FDSM integrates a Semantic-Guided Spectral Residual Module, a Timestep-Adaptive Spectral Loss, and curriculum-based semantic abstraction to better recover fine-grained motion details.
The method reports state-of-the-art performance on multiple skeleton action datasets, including NTU RGB+D, PKU-MMD, and Kinetics-skeleton.
The authors release code and a project homepage, enabling replication and further experimentation by the community.

Abstract

Human action recognition is pivotal in computer vision, with applications ranging from surveillance to human-robot interaction. Despite the effectiveness of supervised skeleton-based methods, their reliance on exhaustive annotation limits generalization to novel actions. Zero-Shot Skeleton Action Recognition (ZSAR) emerges as a promising paradigm, yet it faces challenges due to the spectral bias of diffusion models, which oversmooth high-frequency dynamics. Here, we propose Frequency-Aware Diffusion for Skeleton-Text Matching (FDSM), integrating a Semantic-Guided Spectral Residual Module, a Timestep-Adaptive Spectral Loss, and Curriculum-based Semantic Abstraction to address these challenges. Our approach effectively recovers fine-grained motion details, achieving state-of-the-art performance on NTU RGB+D, PKU-MMD, and Kinetics-skeleton datasets. Code has been made available at https://github.com/yuzhi535/FDSM. Project homepage: https://yuzhi535.github.io/FDSM.github.io/