KazakhOCR: A Synthetic Benchmark for Evaluating Multimodal Models in Low-Resource Kazakh Script OCR
arXiv cs.CV / 3/17/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- KazakhOCR introduces a synthetic OCR benchmark with 7,219 images across Kazakh scripts (Arabic, Cyrillic, and Latin) to evaluate multimodal models on OCR and language identification.
- The authors evaluate Gemma-3-12B-it, Qwen2.5-VL-7B-Instruct, and Llama-3.2-11B-Vision-Instruct, finding none perform well on Latin or Arabic script OCR and misclassify Arabic Kazakh text as other languages.
- Compared with a classical OCR baseline, traditional OCR achieves lower character error rates, highlighting current MLLMs' underperformance on low-resource scripts.
- The results underscore the need for inclusive models and benchmarks to support low-resource scripts and languages, driving future research and dataset development.




