MSGL-Transformer: A Multi-Scale Global-Local Transformer for Rodent Social Behavior Recognition
arXiv cs.CV / 4/10/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces MSGL-Transformer, a lightweight multi-scale global-local transformer designed to recognize rodent social behaviors from pose-based temporal sequences while reducing reliance on manual scoring.
- It uses parallel attention branches covering short-, medium-, and global temporal ranges, plus a Behavior-Aware Modulation (BAM) block to emphasize behavior-relevant temporal embeddings before attention.
- Experiments on RatSI and CalMS21 show strong performance, reaching 75.4% mean accuracy/F1=0.745 on RatSI and 87.1% accuracy/F1=0.8745 on CalMS21.
- Results indicate MSGL-Transformer outperforms several baselines (e.g., TCN, LSTM variants, and multiple pose/action recognition architectures) and transfers across datasets with only input dimensionality and class-count adjustments.



