GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos
arXiv cs.CV / 4/20/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- The paper introduces GAViD, a large-scale multimodal dataset for context-aware group affect recognition from videos, addressing a lack of real-world annotated data.
- GAViD contains 5,091 video clips with multimodal inputs (video, audio, and contextual information) and annotations including ternary valence and discrete emotion labels.
- The dataset is augmented with VideoGPT-generated contextual metadata and human-annotated action cues to better capture contextual and behavioral variability.
- The authors propose CAGNet, a context-aware multimodal recognition network, reporting 63.20% test accuracy on GAViD and matching state-of-the-art performance.
- The dataset and code are released publicly for further research and replication via the provided GitHub repository.
Related Articles

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)
Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI
Dev.to

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else
Dev.to
Local LLM Beginner’s Guide (Mac - Apple Silicon)
Reddit r/artificial

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals
Dev.to