Different Strokes for Different Folks: Writer Identification for Historical Arabic Manuscripts
arXiv cs.LG / 4/27/2026
💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- The study targets writer identification in handwritten Arabic historical manuscripts to support provenance, authenticity checks, and historical/language analysis.
- Using the Muharaf dataset, the authors expanded and cleaned the publicly labeled writer data by manually verifying labels and removing inconsistent or non-handwritten text, increasing line labels substantially.
- They propose a CNN-based attention model for closed-set writer identification, including handling rare “two-writer” lines via composite writer-pair classes.
- Benchmarks across 14 configurations and ablations show that performance drops sharply when evaluating under the harder page-disjoint protocol, highlighting the importance of page-level cues.
- The paper provides the first reported baselines for both line-level and page-disjoint evaluation protocols, and releases code/implementation on GitHub for historians and linguists.




