FG-SGL: Fine-Grained Semantic Guidance Learning via Motion Process Decomposition for Micro-Gesture Recognition

arXiv cs.CV / 3/18/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

FG-SGL is a framework that jointly integrates fine-grained and category-level semantics to guide vision-language models for micro-gesture recognition, addressing subtle inter-class motion variations.
FG-SGL includes FG-SA to leverage fine-grained semantic cues for learning local motion features and CP-A to improve feature separability through category-level semantic guidance.
To support fine-grained guidance, the approach constructs a fine-grained textual dataset with human annotations describing the dynamic process of micro-gestures in four refined semantic dimensions.
A Multi-Level Contrastive Optimization strategy jointly optimizes both modules in a coarse-to-fine pattern, with experiments showing competitive performance.

Abstract

Micro-gesture recognition (MGR) is challenging due to subtle inter-class variations. Existing methods rely on category-level supervision, which is insufficient for capturing subtle and localized motion differences. Thus, this paper proposes a Fine-Grained Semantic Guidance Learning (FG-SGL) framework that jointly integrates fine-grained and category-level semantics to guide vision--language models in perceiving local MG motions. FG-SA adopts fine-grained semantic cues to guide the learning of local motion features, while CP-A enhances the separability of MG features through category-level semantic guidance. To support fine-grained semantic guidance, this work constructs a fine-grained textual dataset with human annotations that describes the dynamic process of MGs in four refined semantic dimensions. Furthermore, a Multi-Level Contrastive Optimization strategy is designed to jointly optimize both modules in a coarse-to-fine pattern. Experiments show that FG-SGL achieves competitive performance, validating the effectiveness of fine-grained semantic guidance for MGR.

Day 10: 230 Sessions of Hustle and It Comes Down to One Person Reading a Document

Dev.to

5 Dangerous Lies Behind Viral AI Coding Demos That Break in Production

Dev.to

Two bots, one confused server: what Nimbus revealed about AI agent identity

Dev.to

OpenTelemetry just standardized LLM tracing. Here's what it actually looks like in code.

Dev.to

PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark forFinance

Dev.to

FG-SGL: Fine-Grained Semantic Guidance Learning via Motion Process Decomposition for Micro-Gesture Recognition

Key Points

Abstract

Related Articles

Day 10: 230 Sessions of Hustle and It Comes Down to One Person Reading a Document

5 Dangerous Lies Behind Viral AI Coding Demos That Break in Production

Two bots, one confused server: what Nimbus revealed about AI agent identity

OpenTelemetry just standardized LLM tracing. Here's what it actually looks like in code.

PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark forFinance

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer