Structured Prompting for Arabic Essay Proficiency: A Trait-Centric Evaluation Approach

arXiv cs.CL / 3/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces a three-tier prompting framework (standard, hybrid, rubric-guided) for trait-specific automatic essay scoring (AES) in Arabic using LLMs under zero-shot and few-shot settings.
It addresses the scarcity of Arabic AES tools and demonstrates that structured prompting enables trait-level evaluation across organization, vocabulary, development, and style rather than relying on model size alone.
The hybrid approach simulates multi-agent evaluation with trait specialist raters, while rubric-guided prompting uses scored exemplars to improve alignment; eight LLMs were evaluated on the QAES Arabic dataset.
Rubric-guided prompting yields consistent gains across traits and models, with Development and Style showing the largest improvements; Fanar-1-9B-Instruct achieves the highest trait-level agreement (QWK 0.28, CI 0.41) in zero- and few-shot settings.
This work establishes the first comprehensive framework for proficiency-oriented Arabic AES and lays the groundwork for scalable assessment in low-resource educational contexts.

Abstract

This paper presents a novel prompt engineering framework for trait specific Automatic Essay Scoring (AES) in Arabic, leveraging large language models (LLMs) under zero-shot and few-shot configurations. Addressing the scarcity of scalable, linguistically informed AES tools for Arabic, we introduce a three-tier prompting strategy (standard, hybrid, and rubric-guided) that guides LLMs in evaluating distinct language proficiency traits such as organization, vocabulary, development, and style. The hybrid approach simulates multi-agent evaluation with trait specialist raters, while the rubric-guided method incorporates scored exemplars to enhance model alignment. In zero and few-shot settings, we evaluate eight LLMs on the QAES dataset, the first publicly available Arabic AES resource with trait level annotations. Experimental results using Quadratic Weighted Kappa (QWK) and Confidence Intervals show that Fanar-1-9B-Instruct achieves the highest trait level agreement in both zero and few-shot prompting (QWK = 0.28 and CI = 0.41), with rubric-guided prompting yielding consistent gains across all traits and models. Discourse-level traits such as Development and Style showed the greatest improvements. These findings confirm that structured prompting, not model scale alone, enables effective AES in Arabic. Our study presents the first comprehensive framework for proficiency oriented Arabic AES and sets the foundation for scalable assessment in low resource educational contexts.

Interactive Web Visualization of GPT-2

Reddit r/artificial

Stop Treating AI Interview Fraud Like a Proctoring Problem

Dev.to

[R] Causal self-attention as a probabilistic model over embeddings

Reddit r/MachineLearning

The 5 software development trends that actually matter in 2026 (and what they mean for your startup)

Dev.to

InVideo AI Review: Fast Finished

Dev.to

Structured Prompting for Arabic Essay Proficiency: A Trait-Centric Evaluation Approach

Key Points

Abstract

Related Articles

Interactive Web Visualization of GPT-2

Stop Treating AI Interview Fraud Like a Proctoring Problem

[R] Causal self-attention as a probabilistic model over embeddings

The 5 software development trends that actually matter in 2026 (and what they mean for your startup)

InVideo AI Review: Fast Finished

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer