AI Navigate

Voxtral WebGPU: Real-time speech transcription entirely in your browser with Transformers.js

Reddit r/LocalLLaMA / 3/12/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • Mistral released Voxtral-Mini-4B-Realtime, a multilingual speech transcription model supporting 13 languages with latency under 500 milliseconds.
  • Transformers.js now supports this model, enabling real-time speech transcription directly in the browser using WebGPU technology.
  • This approach allows live captioning to be performed entirely locally without server-side processing, enhancing privacy and reducing latency.
  • A demo and source code are available on Hugging Face Spaces for users to try out and integrate.
  • This development highlights advances in running efficient large models on client devices using modern web technologies like WebGPU.
Voxtral WebGPU: Real-time speech transcription entirely in your browser with Transformers.js

Mistral recently released Voxtral-Mini-4B-Realtime, a multilingual, realtime speech-transcription model that supports 13 languages and is capable of <500 ms latency. Today, we added support for it to Transformers.js, enabling live captioning entirely locally in the browser on WebGPU. Hope you like it!

Link to demo (+ source code): https://huggingface.co/spaces/mistralai/Voxtral-Realtime-WebGPU

submitted by /u/xenovatech
[link] [comments]