EuraGovExam: A Multilingual Multimodal Benchmark from Real-World Civil Service Exams
arXiv cs.CV / 3/31/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces EuraGovExam, a new multilingual, multimodal benchmark built from real civil service examinations from five Eurasian regions (South Korea, Japan, Taiwan, India, and the European Union).
- The dataset contains 8,000+ high-resolution scanned multiple-choice questions across 17 domains, with all text and visual elements embedded into single images to test layout-aware reasoning.
- EuraGovExam differs from prior benchmarks by requiring models to perform cross-lingual, visual-layout reasoning directly from image input rather than relying on separated OCR/text fields.
- Evaluation results report that even state-of-the-art vision-language models reach only 86% accuracy, highlighting current limitations in handling culturally realistic and visually complex exam documents.
- The benchmark is positioned to support development and evaluation for e-governance and public-sector document analysis, as well as more equitable multilingual exam preparation.
Related Articles

Black Hat Asia
AI Business
[D] How does distributed proof of work computing handle the coordination needs of neural network training?
Reddit r/MachineLearning

Claude Code's Entire Source Code Was Just Leaked via npm Source Maps — Here's What's Inside
Dev.to

BYOK is not just a pricing model: why it changes AI product trust
Dev.to

AI Citation Registries and Identity Persistence Across Records
Dev.to