Vietnamese Automatic Speech Recognition: A Revisit
arXiv cs.CL / 3/17/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- It addresses quality and annotation inconsistencies in open-source ASR data by proposing a robust data aggregation and preprocessing pipeline.
- It demonstrates the pipeline on Vietnamese, yielding a unified 500-hour dataset with word-level timestamps for model training and evaluation.
- It emphasizes data diversity and balance to improve robustness of ASR systems for low-resource languages.
- It provides a project page on GitHub (PhoASR) showing the pipeline's generalizability and how to replicate the results.




