Instruction-Guided Poetry Generation in Arabic and Its Dialects

arXiv cs.CL / 5/1/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • The study proposes instruction-guided poetry generation for Arabic, extending prior LLM research that largely focused on analysis tasks like interpretation, rhyme schemes, and metadata prediction.
  • It introduces a large, curated, instruction-based dataset covering Modern Standard Arabic and multiple dialects, enabling controllable tasks such as writing, revising, and continuing poems with constraints on style and rhyme.
  • The research also includes poetry analysis capabilities within the same instruction framework.
  • Experiments indicate that fine-tuning LLMs on the dataset produces poetry that matches user requirements, validated through automated metrics and human evaluation by native Arabic speakers.
  • The dataset and code are released publicly via the provided GitHub repository, supporting reproducibility and further development.

Abstract

Poetry has long been a central art form for Arabic speakers, serving as a powerful medium of expression and cultural identity. While modern Arabic speakers continue to value poetry, existing research on Arabic poetry within Large Language Models (LLMs) has primarily focused on analysis tasks such as interpretation or metadata prediction, e.g., rhyme schemes and titles. In contrast, our work addresses the practical aspect of poetry creation in Arabic by introducing controllable generation capabilities to assist users in writing poetry. Specifically, we present a large-scale, carefully curated instruction-based dataset in Modern Standard Arabic (MSA) and various Arabic dialects. This dataset enables tasks such as writing, revising, and continuing poems based on predefined criteria, including style and rhyme, as well as performing poetry analysis. Our experiments show that fine-tuning LLMs on this dataset yields models that can effectively generate poetry that is aligned with user requirements, based on both automated metrics and human evaluation with native Arabic speakers. The data and the code are available at https://github.com/mbzuai-nlp/instructpoet-ar