A Catalog of Basque Dialectal Resources: Online Collections and Standard-to-Dialectal Adaptations
arXiv cs.CL / 3/27/2026
💬 OpinionSignals & Early TrendsIdeas & Deep Analysis
Key Points
- The paper compiles a systematic catalog of contemporary Basque dialectal NLP resources, addressing data scarcity by aggregating currently available dialectal data online and via standard-to-dialect adaptations.
- It distinguishes two resource types: data originally written in dialects (e.g., news, radio content, informal tweets, and reference materials like dictionaries/atlases/grammar/video) and data adapted from standard Basque into dialects.
- For manual adaptation, the authors created a high-quality parallel gold evaluation dataset by manually adapting the XNLI test split into Western, Central, and Navarrese-Lapurdian dialects.
- For automatic adaptation, they evaluate an automatically adapted physical commonsense dataset (BasPhyCowest) with additional native-speaker review to judge whether it can replace fully manual “silver” data creation.
Related Articles
GDPR and AI Training Data: What You Need to Know Before Training on Personal Data
Dev.to
Edge-to-Cloud Swarm Coordination for heritage language revitalization programs with embodied agent feedback loops
Dev.to
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
Sector HQ Daily AI Intelligence - March 27, 2026
Dev.to
Data Sovereignty Rules and Enterprise AI
Dev.to