A Bolu: A Structured Dataset for the Computational Analysis of Sardinian Improvisational Poetry
arXiv cs.CL / 4/22/2026
📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces A Bolu, the first structured computational corpus of extemporaneous (improvised) poetry focused on cantada logudorese, a Sardinian language variant.
- The dataset includes 2,835 stanzas totaling 141,321 tokens, addressing a methodological gap in preserving and analyzing oral linguistic heritage with NLP.
- The study outlines the corpus architecture and applies multidimensional computational linguistic methods plus descriptive statistics to characterize the poetic text.
- Findings show recurring production patterns among Sardinian improvisational poets that align with Parry and Lord’s theory of formulaicity.
- The authors argue the resource helps both scholarly understanding of oral creativity and the development of more inclusive NLP tools for less widely spoken languages.
Related Articles

Autoencoders and Representation Learning in Vision
Dev.to
Every AI finance app wants your data. I didn’t trust that — so I built my own. Offline.
Dev.to

Control Claude with Just a URL. The Chrome Extension "Send to Claude" Is Incredibly Useful
Dev.to

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks
Dev.to

Now Meta will track what employees do on their computers to train its AI agents
The Verge