Nuanced Emotion Recognition Based on a Segment-based MLLM Framework Leveraging Qwen3-Omni for AH Detection
arXiv cs.CV / 3/17/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- This paper proposes a segment-based framework that combines temporal segmentation of videos (up to 5 seconds per clip) with Multimodal Large Language Models to improve detection of nuanced emotions like Ambivalence and Hesitancy.
- The method leverages Qwen3-Omni-30B-A3B, fine-tuned on the BAH dataset with LoRA and full-parameter updates via MS-Swift, enabling integrated analysis of visual, audio, and textual cues.
- Experiments report 85.1% accuracy on the test set and show significant improvements over existing benchmarks, highlighting the ability of multimodal LLMs to capture cross-modal emotional conflicts.
- The work provides an open-source release (GitHub) and points to applications in affective computing and digital health.




