KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context
arXiv cs.CL / 4/16/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces KMMMU, a native Korean multimodal benchmark designed to evaluate understanding under Korean cultural, institutional, and discipline-specific visual conventions rather than English- or translation-based settings.
- KMMMU includes 3,466 exam-style Korean questions across nine disciplines and nine visual modality categories, plus a Korean-specific 300-item subset and a 627-question hard subset.
- Experimental results show that the best open-source model achieves only 42.05% accuracy on the full set, while the top proprietary model reaches 52.42% on the hard subset.
- Performance is uneven across disciplines, with Korean-specific questions exhibiting gaps of up to 13.43%, indicating persistent weaknesses in localized conventions and standards understanding.
- Error analysis suggests failures relate to convention-to-label mapping, limited few-shot symbolic induction, localized knowledge recall, and domain-standard comprehension more than to insufficient reasoning depth.
Related Articles

Black Hat Asia
AI Business

Introducing Claude Opus 4.7
Anthropic News

AI traffic to US retailers rose 393% in Q1, and it’s boosting their revenue too
TechCrunch

Who Audits the Auditors? Building an LLM-as-a-Judge for Agentic Reliability
Dev.to

"Enterprise AI Cost Optimization: How Companies Are Cutting AI Infrastructure Sp
Dev.to