CEZSAR: A Contrastive Embedding Method for Zero-Shot Action Recognition
arXiv cs.CV / 5/5/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces CEZSAR, a zero-shot action recognition (ZSAR) method that uses contrastive learning to classify action classes not seen during training.
- It targets two core challenges in ZSAR—semantic gaps between text-derived label representations and visual features, and domain shift caused by differences between unknown test sets and training data.
- CEZSAR learns a joint embedding space by encoding videos and sentences and aligning videos with their natural-language descriptions.
- To improve training, the authors propose an automatic negative sampling strategy that creates additional unpaired (visual appearance with unrelated descriptions) examples for contrastive learning.
- Experiments report state-of-the-art performance on UCF-101 and Kinetics-400 across multiple split settings, and the code is released on GitHub.
Related Articles

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision
Dev.to

How AI is Changing the Way We Code in 2026: The Shift from Syntax to Strategy
Dev.to

13 CLAUDE.md Rules That Make AI Write Modern PHP (Not PHP 5 Resurrected)
Dev.to

MCP annotations are a UX layer, not a security layer
Dev.to
From OOM to 262K Context: Running Qwen3-Coder 30B Locally on 8GB VRAM
Dev.to