GhanaNLP Parallel Corpora: Comprehensive Multilingual Resources for Low-Resource Ghanaian Languages
arXiv cs.CL / 3/17/2026
📰 NewsTools & Practical Usage
Key Points
- The GhanaNLP initiative has developed and curated 41,513 parallel sentence pairs for the Twi, Fante, Ewe, Ga, and Kusaal languages with English to support NLP for low-resource Ghanaian languages.
- The data were collected, translated, and annotated by human professionals and enriched with standard metadata to ensure consistency and usability.
- The corpora are designed for machine translation, speech technologies, and language preservation, and have been deployed in real-world applications such as the Khaya AI translation engine.
- This work contributes to democratizing AI by enabling inclusive and accessible language technologies for African languages.