Contrastive learning-based video quality assessment-jointed video vision transformer for video recognition
arXiv cs.CV / 3/12/2026
📰 NewsModels & Research
Key Points
- The paper proposes SSL-V3: a Self-Supervised Learning-based Video Vision Transformer combined with No-reference Video Quality Assessment (VQA) for video classification to address label scarcity in VQA.
- It introduces a Combined-SSL mechanism that uses video quality scores to directly tune the feature maps of video classification, linking VQA and classification through a supervised objective to tune VQA.
- The approach leverages self-supervised learning to fuse VQA with video recognition and mitigates limited labeled VQA data by using the classification task as supervision.
- It reports robust results on two datasets, including an accuracy of 94.87% on interview videos from the I-CONECT healthcare dataset, demonstrating effectiveness.
- By explicitly considering video quality, the framework improves both video quality assessment and recognition performance in a joint setting.
Related Articles
Two bots, one confused server: what Nimbus revealed about AI agent identity
Dev.to
PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark forFinance
Dev.to
A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research
MarkTechPost
DNA Memory: Making AI Agents Learn, Forget, and Evolve Like a Human Brain
Dev.to
Tinybox- offline AI device 120B parameters
Hacker News