SF20K Competition 2025: Summary and findings
arXiv cs.CV / 5/5/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The SF20K Competition 2025, run alongside the SLoMO Workshop at ICCV 2025, focused on story-level video understanding via an open-ended video question-answering task using amateur short films.
- Models were evaluated on the SF20K-Test benchmark (95 movies, 979 QA pairs) with an automated judging approach (LLM-QA-Eval) powered by GPT-4.1-nano.
- The competition drew 22 teams and 286 submissions, with a Main Track (unrestricted model size) and a Special Track (models under 8B parameters); the top team reached 65.7% and 48.7% accuracy respectively.
- Key findings show that narrative-aware, shot-level processing beats uniform frame sampling, multi-stage pipelines with smaller models can rival far larger end-to-end models, and subtitle quality is a major performance driver.
- The results suggest the main bottleneck in long-form video QA is information selection and reasoning structure rather than raw model capacity, and there remains a large gap to human-level narrative comprehension.
Related Articles

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision
Dev.to

Meta will use AI to analyze height and bone structure to identify if users are underage
TechCrunch

Google, Microsoft, and xAI will allow the US government to review their new AI models
The Verge

How AI is Changing the Way We Code in 2026: The Shift from Syntax to Strategy
Dev.to

ElevenLabs lists BlackRock, Jamie Foxx and Longoria as new investors
TechCrunch