DIA-HARM: Dialectal Disparities in Harmful Content Detection Across 50 English Dialects
arXiv cs.CL / 2026/4/8
📰 ニュースSignals & Early TrendsIdeas & Deep AnalysisModels & Research
要点
- The paper introduces DIA-HARM, a benchmark to evaluate disinformation/harmful-content detectors across 50 English dialects rather than only Standard American English.
- It releases the D3 corpus (195K samples) built via linguistically grounded transformations from established disinformation benchmarks, enabling dialect-robust testing.
- Testing 16 detection models finds systematic weaknesses: human-written dialectal content lowers F1 by 1.4–3.6%, while AI-generated content stays comparatively stable.
- Fine-tuned transformers outperform zero-shot LLM approaches (best-case F1 96.6% vs. 78.3%), and some models suffer catastrophic degradation (>33%) especially on mixed content.
- Cross-dialect transfer results show multilingual models (e.g., mDeBERTa average F1 97.2%) generalize well, whereas monolingual models (RoBERTa, XLM-RoBERTa) fail more on dialectal inputs, highlighting potential unfair disadvantage for non-SAE speakers.



