FG-SGL: Fine-Grained Semantic Guidance Learning via Motion Process Decomposition for Micro-Gesture Recognition
arXiv cs.CV / 3/18/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- FG-SGL is a framework that jointly integrates fine-grained and category-level semantics to guide vision-language models for micro-gesture recognition, addressing subtle inter-class motion variations.
- FG-SGL includes FG-SA to leverage fine-grained semantic cues for learning local motion features and CP-A to improve feature separability through category-level semantic guidance.
- To support fine-grained guidance, the approach constructs a fine-grained textual dataset with human annotations describing the dynamic process of micro-gestures in four refined semantic dimensions.
- A Multi-Level Contrastive Optimization strategy jointly optimizes both modules in a coarse-to-fine pattern, with experiments showing competitive performance.
Related Articles

Astral to Join OpenAI
Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA

Why Data is Important for LLM
Dev.to

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.
Dev.to

YouTube's Deepfake Shield for Politicians Changes Evidence Forever
Dev.to