SAHM: A Benchmark for Arabic Financial and Shari'ah-Compliant Reasoning
arXiv cs.CL / 4/22/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- The article introduces SAHM, a new Arabic financial NLP benchmark and instruction-tuning dataset focused on document-grounded and Shari'ah-compliant reasoning.
- SAHM includes 14,380 expert-verified examples across seven tasks, covering AAOIFI standards QA, fatwa-based QA/MCQ, accounting/business exams, sentiment analysis, extractive summarization, and event-cause reasoning.
- The authors evaluate 19 open and proprietary LLMs with task-specific metrics and rubric-based scoring for open-ended responses.
- Results show that strong Arabic language ability does not reliably translate into evidence-grounded financial reasoning, with the biggest performance gaps on event-cause reasoning.
- The benchmark, evaluation framework, and an instruction-tuned model are released to enable further research into trustworthy Arabic financial NLP.
Related Articles
Context Engineering for Developers: A Practical Guide (2026)
Dev.to
GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.
Dev.to
AI Visibility Tracking Exploded in 2026: 6 Tools Every Brand Needs Now
Dev.to
I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)
Dev.to
Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF
Reddit r/LocalLLaMA