GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos

arXiv cs.CV / 4/20/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper introduces GAViD, a large-scale multimodal dataset for context-aware group affect recognition from videos, addressing a lack of real-world annotated data.
  • GAViD contains 5,091 video clips with multimodal inputs (video, audio, and contextual information) and annotations including ternary valence and discrete emotion labels.
  • The dataset is augmented with VideoGPT-generated contextual metadata and human-annotated action cues to better capture contextual and behavioral variability.
  • The authors propose CAGNet, a context-aware multimodal recognition network, reporting 63.20% test accuracy on GAViD and matching state-of-the-art performance.
  • The dataset and code are released publicly for further research and replication via the provided GitHub repository.

Abstract

Understanding affective dynamics in real-world social systems is fundamental to modeling and analyzing human-human interactions in complex environments. Group affect emerges from intertwined human-human interactions, contextual influences, and behavioral cues, making its quantitative modeling a challenging computational social systems problem. However, computational modeling of group affect in in-the-wild scenarios remains challenging due to limited large-scale annotated datasets and the inherent complexity of multimodal social interactions shaped by contextual and behavioral variability. The lack of comprehensive datasets annotated with multimodal and contextual information further limits advances in the field. To address this, we introduce the Group Affect from ViDeos (GAViD) dataset, comprising 5091 video clips with multimodal data (video, audio and context), annotated with ternary valence and discrete emotion labels and enriched with VideoGPT-generated contextual metadata and human-annotated action cues. We also present Context-Aware Group Affect Recognition Network (CAGNet) for multimodal context-aware group affect recognition. CAGNet achieves 63.20\% test accuracy on GAViD, comparable to state-of-the-art performance. The dataset and code are available at github.com/deepakkumar-iitr/GAViD.