A multilingual hallucination benchmark: MultiWikiQHalluA
arXiv cs.CL / 5/5/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The new arXiv paper introduces a multilingual hallucination benchmark (MultiWikiQHalluA) to address the gap that most hallucination evaluations are conducted only in English.
- It defines “faithfulness hallucinations” as fluent, plausible outputs that either contradict the provided input or are internally inconsistent, and builds multilingual synthetic hallucination datasets using MultiWikiQA and the LettuceDetect framework.
- The authors train token-level hallucination classifiers for 30 European languages and evaluate hallucination rates across selected languages (English, Danish, German, Icelandic).
- Results show that small model Qwen3-0.6B has markedly high hallucination rates (up to 60% of answers containing at least one hallucination, highest in Icelandic), while larger models generally reduce hallucinations.
- Hallucination rates are consistently higher in lower-resource languages, indicating that language coverage and resource availability significantly affect model faithfulness.
Related Articles

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision
Dev.to

Meta will use AI to analyze height and bone structure to identify if users are underage
TechCrunch

Google, Microsoft, and xAI will allow the US government to review their new AI models
The Verge

How AI is Changing the Way We Code in 2026: The Shift from Syntax to Strategy
Dev.to

ElevenLabs lists BlackRock, Jamie Foxx and Longoria as new investors
TechCrunch