Is Semi-Automatic Transcription Useful in Corpus Creation? Preliminary Considerations on the KIParla Corpus
arXiv cs.CL / 3/18/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper analyzes the use of Automatic Speech Recognition (ASR) within the transcription workflow of the KIParla corpus, a resource of spoken Italian.
- In a two-phase experiment, 11 transcribers of varying expertise produced both manual and ASR-assisted transcriptions of identical audio segments across three conversation types.
- The results show that ASR-assisted workflows can increase transcription speed but do not consistently improve overall accuracy, with outcomes depending on workflow configuration, conversation type, and annotator experience.
- The study combines alignment-based metrics, descriptive statistics, and statistical modeling to monitor transcription behavior across annotators and workflows.
- Despite limitations, ASR-assisted transcription—potentially supported by task-specific fine-tuning—could be integrated into KIParla to accelerate corpus creation without compromising transcription quality.
Related Articles
How CVE-2026-25253 exposed every OpenClaw user to RCE — and how to fix it in one command
Dev.to
Day 10: An AI Agent's Revenue Report — $29, 25 Products, 160 Tweets
Dev.to
Does Synthetic Data Generation of LLMs Help Clinical Text Mining?
Dev.to
What CVE-2026-25253 Taught Me About Building Safe AI Assistants
Dev.to
Krish Naik: AI Learning Path For 2026- Data Science, Generative and Agentic AI Roadmap
Dev.to