| Mistral recently released Voxtral-Mini-4B-Realtime, a multilingual, realtime speech-transcription model that supports 13 languages and is capable of <500 ms latency. Today, we added support for it to Transformers.js, enabling live captioning entirely locally in the browser on WebGPU. Hope you like it! Link to demo (+ source code): https://huggingface.co/spaces/mistralai/Voxtral-Realtime-WebGPU [link] [comments] |
Voxtral WebGPU: Real-time speech transcription entirely in your browser with Transformers.js
Reddit r/LocalLLaMA / 3/12/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- Mistral released Voxtral-Mini-4B-Realtime, a multilingual speech transcription model supporting 13 languages with latency under 500 milliseconds.
- Transformers.js now supports this model, enabling real-time speech transcription directly in the browser using WebGPU technology.
- This approach allows live captioning to be performed entirely locally without server-side processing, enhancing privacy and reducing latency.
- A demo and source code are available on Hugging Face Spaces for users to try out and integrate.
- This development highlights advances in running efficient large models on client devices using modern web technologies like WebGPU.




