AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models

arXiv cs.CV / 4/10/2026

📰 NewsSignals & Early TrendsModels & Research

共有:

Key Points

The paper presents AtlasOCR, described as the first open-source OCR model tailored specifically for Darija (Moroccan Arabic), built by fine-tuning a 3B-parameter Vision Language Model.
It details a data pipeline combining Darija-specific dataset curation with synthetic text generation (via the authors’ OCRSmith library) plus carefully sourced real-world samples.
The authors use parameter-efficient fine-tuning (Q- LoRA) with Unsloth to efficiently train Qwen2.5-VL 3B, along with ablation studies to optimize training hyperparameters.
AtlasOCR is evaluated on a new benchmark (AtlasOCRBench) and the established KITAB-Bench, where it reportedly achieves state-of-the-art results and demonstrates strong generalization across Darija and standard Arabic OCR tasks.
The work positions the model as competitive with larger OCR systems, emphasizing robustness and transferability rather than relying solely on scale.

Abstract

Darija, the Moroccan Arabic dialect, is rich in visual content yet lacks specialized Optical Character Recognition (OCR) tools. This paper introduces AtlasOCR, the first open-source Darija OCR model built by fine-tuning a 3B parameter Vision Language Model (VLM). We detail our comprehensive approach, from curating a unique Darija-specific dataset leveraging both synthetic generation with our OCRSmith library and carefully sourced real-world data, to implementing efficient fine-tuning strategies. We utilize QLoRA and Unsloth for parameter-efficient training of Qwen2.5-VL 3B and present comprehensive ablation studies optimizing key hyperparameters. Our evaluation on the newly curated AtlasOCRBench and the established KITAB-Bench demonstrates state-of-the-art performance, challenging larger models and highlighting AtlasOCR's robustness and generalization capabilities for both Darija and standard Arabic OCR tasks.

Black Hat Asia

AI Business

CIA is trusting AI to help analyze intel from human spies

Reddit r/artificial

LLM API Pricing in 2026: I Put Every Major Model in One Table

Dev.to

i generated AI video on a GTX 1660. here's what it actually takes.

Dev.to

The $50,000 Build with MeDo Hackathon is NOW LIVE!

Dev.to

AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models

Key Points

Abstract

Related Articles

Black Hat Asia

CIA is trusting AI to help analyze intel from human spies

LLM API Pricing in 2026: I Put Every Major Model in One Table

i generated AI video on a GTX 1660. here's what it actually takes.

The $50,000 Build with MeDo Hackathon is NOW LIVE!

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer