Is Semi-Automatic Transcription Useful in Corpus Creation? Preliminary Considerations on the KIParla Corpus
arXiv cs.CL / 3/18/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper analyzes the use of Automatic Speech Recognition (ASR) within the transcription workflow of the KIParla corpus, a resource of spoken Italian.
- In a two-phase experiment, 11 transcribers of varying expertise produced both manual and ASR-assisted transcriptions of identical audio segments across three conversation types.
- The results show that ASR-assisted workflows can increase transcription speed but do not consistently improve overall accuracy, with outcomes depending on workflow configuration, conversation type, and annotator experience.
- The study combines alignment-based metrics, descriptive statistics, and statistical modeling to monitor transcription behavior across annotators and workflows.
- Despite limitations, ASR-assisted transcription—potentially supported by task-specific fine-tuning—could be integrated into KIParla to accelerate corpus creation without compromising transcription quality.




