Team LEYA in 10th ABAW Competition: Multimodal Ambivalence/Hesitancy Recognition Approach
arXiv cs.AI / 3/16/2026
📰 NewsModels & Research
Key Points
- The paper proposes a multimodal approach for video-level ambivalence/hesitancy recognition that integrates scene, face, audio, and text information.
- It employs VideoMAE for scene dynamics, emotion-based face embeddings with statistical pooling, EmotionWav2Vec2.0 with a Mamba temporal encoder for audio, and fine-tuned transformer models for text, followed by prototype-augmented multimodal fusion.
- On the BAH corpus, multimodal fusion outperforms unimodal baselines, achieving an average MF1 of 83.25% with the best fusion model and 71.43% final test performance via ensemble of prototype-augmented models.
- The results underscore the importance of combining multiple cues and robust fusion strategies for accurate ambivalence/hesitancy recognition in unconstrained videos.
Related Articles

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.
Dev.to

My AI Does Not Have a Clock
Dev.to
How to settle on a coding LLM ? What parameters to watch out for ?
Reddit r/LocalLLaMA

Andrej Karpathy's autonomous AI research agent ran 700 experiments in 2 days and gave a glimpse of where AI is heading
Reddit r/artificial

So cursor admits that Kimi K2.5 is the best open source model
Reddit r/LocalLLaMA