GhazalBench: Usage-Grounded Evaluation of LLMs on Persian Ghazals
arXiv cs.CL / 3/12/2026
📰 NewsModels & Research
Key Points
- GhazalBench is introduced as a benchmark to evaluate LLMs on Persian ghazals under usage-grounded conditions, focusing on producing faithful paraphrases and accessing canonical verses.
- The evaluation across several proprietary and open-weight multilingual LLMs reveals a consistent dissociation: models generally capture poetic meaning but struggle with exact verse recall in completion-based tasks, while recognition-based tasks reduce this gap.
- An English sonnet benchmark shows markedly higher recall, suggesting the limits are tied to training exposure rather than architectural constraints.
- The authors advocate evaluation frameworks that jointly assess meaning, form, and cue-dependent access to culturally significant texts, and GhazalBench is publicly available at the linked GitHub repository.
Related Articles

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA
QwenDean-4B | fine-tuned SLM for UIGen; our first attempt, looking for feedback!
Reddit r/LocalLLaMA
acestep.cpp: portable C++17 implementation of ACE-Step 1.5 music generation using GGML. Runs on CPU, CUDA, ROCm, Metal, Vulkan
Reddit r/LocalLLaMA

**Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding**
Hugging Face Blog

Newest GPU server in the lab! 72gb ampere vram!
Reddit r/LocalLLaMA