Fanar 2.0: Arabic Generative AI Stack

arXiv cs.CL / 3/18/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

Fanar 2.0 is the second generation of Qatar's Arabic-centric Generative AI platform, designed and operated entirely in-house at QCRI with sovereignty as a core principle.
It runs on 256 NVIDIA H100 GPUs and uses a data-quality-first strategy with targeted continual pre-training and model merging to achieve gains while using 8x fewer pre-training tokens than Fanar 1.0.
The core Fanar-27B model is continually pre-trained from a Gemma-3-27B backbone on a curated corpus of 120 billion high-quality tokens across three data recipes, delivering benchmark gains of Arabic knowledge by 9.1 points, language by 7.3 points, dialects by 3.5 points, and English capability by 7.6 points.
The Fanar 2.0 stack adds capabilities including FanarGuard moderation, Aura long-form ASR, Oryx Arabic-aware image/video understanding and generation, an agentic tool-calling framework for multi-step workflows, Fanar-Sadiq for Islamic content, Fanar-Diwan for classical Arabic poetry generation, FanarShaheen bilingual translation, and a redesigned multi-layer orchestrator for intent-aware routing and safety validation, collectively showing sovereign, resource-constrained AI can rival larger-scale systems.

Abstract

We present Fanar 2.0, the second generation of Qatar's Arabic-centric Generative AI platform. Sovereignty is a first-class design principle: every component, from data pipelines to deployment infrastructure, was designed and operated entirely at QCRI, Hamad Bin Khalifa University. Fanar 2.0 is a story of resource-constrained excellence: the effort ran on 256 NVIDIA H100 GPUs, with Arabic having only ~0.5% of web data despite 400 million native speakers. Fanar 2.0 adopts a disciplined strategy of data quality over quantity, targeted continual pre-training, and model merging to achieve substantial gains within these constraints. At the core is Fanar-27B, continually pre-trained from a Gemma-3-27B backbone on a curated corpus of 120 billion high-quality tokens across three data recipes. Despite using 8x fewer pre-training tokens than Fanar 1.0, it delivers substantial benchmark improvements: Arabic knowledge (+9.1 pts), language (+7.3 pts), dialects (+3.5 pts), and English capability (+7.6 pts). Beyond the core LLM, Fanar 2.0 introduces a rich stack of new capabilities. FanarGuard is a state-of-the-art 4B bilingual moderation filter for Arabic safety and cultural alignment. The speech family Aura gains a long-form ASR model for hours-long audio. Oryx vision family adds Arabic-aware image and video understanding alongside culturally grounded image generation. An agentic tool-calling framework enables multi-step workflows. Fanar-Sadiq utilizes a multi-agent architecture for Islamic content. Fanar-Diwan provides classical Arabic poetry generation. FanarShaheen delivers LLM-powered bilingual translation. A redesigned multi-layer orchestrator coordinates all components through intent-aware routing and defense-in-depth safety validation. Taken together, Fanar 2.0 demonstrates that sovereign, resource-constrained AI development can produce systems competitive with those built at far greater scale.

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

日経XTECH

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO

Dev.to

Why Regex is Not Enough: Building a Deterministic "Sudo" Layer for AI Agents

Dev.to

Perplexity Hub

Dev.to

How to Build Passive Income with AI in 2026: A Developer's Practical Guide

Dev.to

Fanar 2.0: Arabic Generative AI Stack

Key Points

Abstract

Related Articles

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO

Why Regex is Not Enough: Building a Deterministic "Sudo" Layer for AI Agents

Perplexity Hub

How to Build Passive Income with AI in 2026: A Developer's Practical Guide

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer