AI Navigate

HSEmotion Team at ABAW-10 Competition: Facial Expression Recognition, Valence-Arousal Estimation, Action Unit Detection and Fine-Grained Violence Classification

arXiv cs.AI / 3/16/2026

💬 OpinionModels & Research

Key Points

  • The paper reports results for the 10th ABAW competition across frame-wise facial expression recognition, valence-arousal estimation, action unit detection, and fine-grained violence classification.
  • They propose a fast approach using facial embedding extraction with pre-trained EfficientNet-based emotion recognition models, deploying a threshold to trust the model's prediction or fall back to a simple MLP trained on AffWild2 embeddings.
  • Estimated class scores are smoothed with a sliding window to mitigate noise in frame-wise predictions.
  • For the violence detection task, they evaluate several pre-trained frame-embedding architectures and aggregation methods for video classification, showing significant improvements over existing baselines on four ABAW tasks.

Abstract

This article presents our results for the 10th Affective Behavior Analysis in-the-Wild (ABAW) competition. For frame-wise facial emotion understanding tasks (frame-wise facial expression recognition, valence-arousal estimation, action unit detection), we propose a fast approach based on facial embedding extraction with pre-trained EfficientNet-based emotion recognition models. If the latter model's confidence exceeds a threshold, its prediction is used. Otherwise, we feed embeddings into a simple multi-layered perceptron trained on the AffWild2 dataset. Estimated class-level scores are smoothed in a sliding window of fixed size to mitigate noise in frame-wise predictions. For the fine-grained violence detection task, we examine several pre-trained architectures for frame embeddings and their aggregation for video classification. Experimental results on four tasks from the ABAW challenge demonstrate that our approach significantly improves validation metrics over existing baselines.