Cohere Transcribe WebGPU: state-of-the-art multilingual speech recognition in your browser

Reddit r/LocalLLaMA / 3/28/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • Cohere released its first speech-to-text model, which is reported to top the OpenASR leaderboard (at least for English) while supporting 14 languages.
  • A developer built a WebGPU demo that runs the transcription model entirely locally in the browser using Transformers.js.
  • The demo and its source code are published on Hugging Face Spaces, enabling others to test and build similar client-side speech recognition experiences.
  • The release highlights the growing feasibility of high-performing multilingual ASR models running on-device, which can improve privacy and reduce latency for browser-based apps.
Cohere Transcribe WebGPU: state-of-the-art multilingual speech recognition in your browser

Yesterday, Cohere released their first speech-to-text model, which now tops the OpenASR leaderboard (for English, but the model does support 14 different languages).

So, I decided to build a WebGPU demo for it: running the model entirely locally in the browser with Transformers.js. I hope you like it!

Link to demo (+ source code): https://huggingface.co/spaces/CohereLabs/Cohere-Transcribe-WebGPU

submitted by /u/xenovatech
[link] [comments]