AppTek Call-Center Dialogues: A Multi-Accent Long-Form Benchmark for English ASR
arXiv cs.CL / 5/1/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- The AppTek Call-Center Dialogues dataset addresses a key gap in English ASR evaluation by providing spontaneous, role-played long-form call-center conversations with explicit multi-accent coverage.
- The corpus spans 14 English accents and 16 service-oriented scenarios, specifically commissioned for evaluation with no prior public release of the audio or text to limit overlap with existing pretraining data.
- The study benchmarks multiple open-source ASR systems and varies the segmentation approach to test how preprocessing choices affect recognition quality.
- Findings show significant performance differences across accents and segmentation methods, demonstrating that strong results on general American English benchmarks may not transfer to other dialects.
- Overall, the work provides a more realistic and robust benchmark for conversational AI use cases that require handling diverse speakers and longer dialogue contexts.
Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!
Reddit r/artificial

Announcing the NVIDIA Nemotron 3 Super Build Contest
Dev.to

75% of Sites Blocking AI Bots Still Get Cited. Here Is Why Blocking Does Not Work.
Dev.to

How to Fix OpenClaw Tool Calling Issues
Dev.to