Spectral Scalpel: Amplifying Adjacent Action Discrepancy via Frequency-Selective Filtering for Skeleton-Based Action Segmentation

arXiv cs.CV / 3/26/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses limitations in skeleton-based Temporal Action Segmentation (STAS), where adjacent action classes often have insufficient spatio-temporal discriminability and blurred segmentation boundaries.
  • It proposes “Spectral Scalpel,” a frequency-selective filtering framework that suppresses shared frequency components between neighboring actions while amplifying action-specific frequencies to increase inter-action discrepancy.
  • The method uses adaptive multi-scale spectral filters and a discrepancy loss focused on adjacent actions, aiming to sharpen transition boundaries and reduce inter-class confusion.
  • To improve temporal and cross-channel representations, it adds a frequency-aware channel mixer that aggregates spectral information across channels to strengthen channel evolution.
  • Experiments across five public datasets show state-of-the-art performance, and the authors provide an open-source codebase for reproducibility.

Abstract

Skeleton-based Temporal Action Segmentation (STAS) seeks to densely segment and classify diverse actions within long, untrimmed skeletal motion sequences. However, existing STAS methodologies face challenges of limited inter-class discriminability and blurred segmentation boundaries, primarily due to insufficient distinction of spatio-temporal patterns between adjacent actions. To address these limitations, we propose Spectral Scalpel, a frequency-selective filtering framework aimed at suppressing shared frequency components between adjacent distinct actions while amplifying their action-specific frequencies, thereby enhancing inter-action discrepancies and sharpening transition boundaries. Specifically, Spectral Scalpel employs adaptive multi-scale spectral filters as scalpels to edit frequency spectra, coupled with a discrepancy loss between adjacent actions serving as the surgical objective. This design amplifies representational disparities between neighboring actions, effectively mitigating boundary localization ambiguities and inter-class confusion. Furthermore, complementing long-term temporal modeling, we introduce a frequency-aware channel mixer to strengthen channel evolution by aggregating spectra across channels. This work presents a novel paradigm for STAS that extends conventional spatio-temporal modeling by incorporating frequency-domain analysis. Extensive experiments on five public datasets demonstrate that Spectral Scalpel achieves state-of-the-art performance. Code is available at https://github.com/HaoyuJi/SpecScalpel.