DariMis: Harm-Aware Modeling for Dari Misinformation Detection on YouTube
arXiv cs.CL / 3/25/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- DariMis is presented as the first manually annotated dataset of 9,224 Dari-language YouTube videos, labeled by both information type (misinformation/partly true/true) and harm level (low/medium/high).
- The study finds the two labeling dimensions are structurally coupled, with 55.9% of misinformation showing at least medium harm versus only 1.0% for true content, suggesting misinformation classifiers can act as implicit harm-triage filters.
- A pair-input encoding approach (separately encoding YouTube title and description with BERT segments) improves misinformation recall by 7.0 percentage points (60.1% to 67.1%) compared with single-field concatenation.
- In benchmarking, a Dari/Farsi-focused ParsBERT model outperforms XLM-RoBERTa-base, achieving 76.60% accuracy and 72.77 macro F1 on the test set, with confidence intervals reported for metrics.
- The work emphasizes both practical value for safety-critical detection and statistical limitations, using an ablation study and discussion of macro vs. minority-class tradeoffs.
Related Articles
The Security Gap in MCP Tool Servers (And What I Built to Fix It)
Dev.to
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
I made a new programming language to get better coding with less tokens.
Dev.to
RSA Conference 2026: The Week Vibe Coding Security Became Impossible to Ignore
Dev.to

Adversarial AI framework reveals mechanisms behind impaired consciousness and a potential therapy
Reddit r/artificial