TR-EduVSum: A Turkish-Focused Dataset and Consensus Framework for Educational Video Summarization
arXiv cs.CL / 4/10/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces TR-EduVSum, a new Turkish-focused educational video summarization dataset built from 82 Data Structures and Algorithms course videos and 3,281 independent human summaries.
- It proposes AutoMUP (Automatic Meaning Unit Pyramid), a framework that extracts meaning units from multiple human summaries, clusters them using embeddings, and statistically models inter-participant agreement.
- AutoMUP produces graded “gold-standard” summaries by weighting meaning units according to consensus, with the top-consensus configuration defined as the reference summary.
- Experiments report that AutoMUP summaries achieve high semantic overlap with strong LLM-generated summaries (e.g., Flash 2.5 and GPT-5.1), suggesting the framework can approximate high-quality model outputs.
- Ablation studies highlight that consensus weighting and clustering are key drivers of summary quality, and the authors argue the approach generalizes to other Turkic languages at low cost.
Related Articles

GLM 5.1 tops the code arena rankings for open models
Reddit r/LocalLLaMA
can we talk about how AI has gotten really good at lying to you?
Reddit r/artificial

AI just found thousands of zero-days. Your firewall is still pattern-matching from 2014
Dev.to

Emergency Room and the Vanishing Moat
Dev.to

I Built a 100% Browser-Based OCR That Never Uploads Your Documents — Here's How
Dev.to