FG-SGL: Fine-Grained Semantic Guidance Learning via Motion Process Decomposition for Micro-Gesture Recognition
arXiv cs.CV / 3/18/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- FG-SGL is a framework that jointly integrates fine-grained and category-level semantics to guide vision-language models for micro-gesture recognition, addressing subtle inter-class motion variations.
- FG-SGL includes FG-SA to leverage fine-grained semantic cues for learning local motion features and CP-A to improve feature separability through category-level semantic guidance.
- To support fine-grained guidance, the approach constructs a fine-grained textual dataset with human annotations describing the dynamic process of micro-gestures in four refined semantic dimensions.
- A Multi-Level Contrastive Optimization strategy jointly optimizes both modules in a coarse-to-fine pattern, with experiments showing competitive performance.
Related Articles
Day 10: 230 Sessions of Hustle and It Comes Down to One Person Reading a Document
Dev.to

5 Dangerous Lies Behind Viral AI Coding Demos That Break in Production
Dev.to
Two bots, one confused server: what Nimbus revealed about AI agent identity
Dev.to

OpenTelemetry just standardized LLM tracing. Here's what it actually looks like in code.
Dev.to
PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark forFinance
Dev.to