Unsupervised Skeleton-Based Action Segmentation via Hierarchical Spatiotemporal Vector Quantization

arXiv cs.CV / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces a hierarchical spatiotemporal vector quantization framework for unsupervised skeleton-based temporal action segmentation.
It uses two consecutive quantization levels: a lower level captures fine-grained subactions and a higher level aggregates them into action-level representations.
The method first shows strong results by primarily exploiting spatial cues through reconstruction of input skeletons, then improves by incorporating both spatial and temporal information.
The extended hierarchical spatiotemporal version performs multi-level clustering while also reconstructing the skeleton inputs and their corresponding timestamps.
Experiments on HuGaDB, LARa, and BABEL report new state-of-the-art performance and reduced segment-length bias in unsupervised action segmentation.

Abstract

We propose a novel hierarchical spatiotemporal vector quantization framework for unsupervised skeleton-based temporal action segmentation. We first introduce a hierarchical approach, which includes two consecutive levels of vector quantization. Specifically, the lower level associates skeletons with fine-grained subactions, while the higher level further aggregates subactions into action-level representations. Our hierarchical approach outperforms the non-hierarchical baseline, while primarily exploiting spatial cues by reconstructing input skeletons. Next, we extend our approach by leveraging both spatial and temporal information, yielding a hierarchical spatiotemporal vector quantization scheme. In particular, our hierarchical spatiotemporal approach performs multi-level clustering, while simultaneously recovering input skeletons and their corresponding timestamps. Lastly, extensive experiments on multiple benchmarks, including HuGaDB, LARa, and BABEL, demonstrate that our approach establishes a new state-of-the-art performance and reduces segment length bias in unsupervised skeleton-based temporal action segmentation.

langchain-anthropic==1.4.1

LangChain Releases

🚀 Anti-Gravity Meets Cloud AI: The Future of Effortless Development

Dev.to

Talk to Your Favorite Game Characters! Mantella Brings AI to Skyrim and Fallout 4 NPCs

Dev.to

AI Will Run Companies. Here's Why That Should Excite You, Not Scare You.

Dev.to

The problem with Big Tech AI pricing (and why 8 countries can't afford to compete)

Dev.to

Unsupervised Skeleton-Based Action Segmentation via Hierarchical Spatiotemporal Vector Quantization

Key Points

Abstract

Related Articles

langchain-anthropic==1.4.1

🚀 Anti-Gravity Meets Cloud AI: The Future of Effortless Development

Talk to Your Favorite Game Characters! Mantella Brings AI to Skyrim and Fallout 4 NPCs

AI Will Run Companies. Here's Why That Should Excite You, Not Scare You.

The problem with Big Tech AI pricing (and why 8 countries can't afford to compete)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer