Micro-AU CLIP: Fine-Grained Contrastive Learning from Local Independence to Global Dependency for Micro-Expression Action Unit Detection
arXiv cs.CV / 3/18/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces Micro-AU CLIP, a framework for micro-expression AU detection that decomposes the process into local semantic independence modeling (LSI) and global semantic dependency (GSD) modeling.
- In LSI, Patch Token Attention (PTA) maps local features within an AU region to a shared feature space to better capture locality.
- In GSD, Global Dependency Attention (GDA) and Global Dependency Loss (GDLoss) model the global relationships between different AUs under specific emotional states, improving AU feature representations.
- To address CLIP's limitations in micro-semantic alignment, the authors design a microAU contrastive loss (MiAUCL) for fine-grained visual-text alignment of AU features.
- The approach enables emotion-label-free micro-AU recognition and reportedly achieves state-of-the-art performance on micro-AU detection experiments.




