Giving Voice to the Constitution: Low-Resource Text-to-Speech for Quechua and Spanish Using a Bilingual Legal Corpus

arXiv cs.AI / 4/16/2026

💬 OpinionSignals & Early TrendsModels & Research

共有:

Key Points

The paper introduces a unified, bilingual text-to-speech pipeline that generates high-quality Quechua and Spanish speech for Peru’s Constitution using XTTS v2, F5-TTS, and DiFlow-TTS.
It trains models on separate Spanish and Quechua speech datasets with different sizes and recording conditions, then applies bilingual/multilingual TTS features to improve output quality across both languages.
Cross-lingual transfer is used to reduce the impact of Quechua data scarcity while maintaining naturalness in Spanish.
The authors release trained checkpoints, inference code, and synthesized audio for each constitutional article, positioning the work as a reusable resource for indigenous and multilingual TTS.
Overall, the research targets more inclusive speech technology for political and legal content in low-resource linguistic settings.

Abstract

We present a unified pipeline for synthesizing high-quality Quechua and Spanish speech for the Peruvian Constitution using three state-of-the-art text-to-speech (TTS) architectures: XTTS v2, F5-TTS, and DiFlow-TTS. Our models are trained on independent Spanish and Quechua speech datasets with heterogeneous sizes and recording conditions, and leverage bilingual and multilingual TTS capabilities to improve synthesis quality in both languages. By exploiting cross-lingual transfer, our framework mitigates data scarcity in Quechua while preserving naturalness in Spanish. We release trained checkpoints, inference code, and synthesized audio for each constitutional article, providing a reusable resource for speech technologies in indigenous and multilingual contexts. This work contributes to the development of inclusive TTS systems for political and legal content in low-resource settings.