| Mistral recently released Voxtral-Mini-4B-Realtime, a multilingual, realtime speech-transcription model that supports 13 languages and is capable of <500 ms latency. Today, we added support for it to Transformers.js, enabling live captioning entirely locally in the browser on WebGPU. Hope you like it! Link to demo (+ source code): https://huggingface.co/spaces/mistralai/Voxtral-Realtime-WebGPU [link] [comments] |
Voxtral WebGPU: Real-time speech transcription entirely in your browser with Transformers.js
Reddit r/LocalLLaMA / 3/12/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- Mistral released Voxtral-Mini-4B-Realtime, a multilingual speech transcription model supporting 13 languages with latency under 500 milliseconds.
- Transformers.js now supports this model, enabling real-time speech transcription directly in the browser using WebGPU technology.
- This approach allows live captioning to be performed entirely locally without server-side processing, enhancing privacy and reducing latency.
- A demo and source code are available on Hugging Face Spaces for users to try out and integrate.
- This development highlights advances in running efficient large models on client devices using modern web technologies like WebGPU.
Related Articles

Manus、AIエージェントをデスクトップ化 ローカルPC上でファイルやアプリを直接操作可能にのサムネイル画像
Ledge.ai

The programming passion is melting
Dev.to

Best AI Tools for Property Managers in 2026
Dev.to

Building “The Sentinel” – AI Parametric Insurance at Guidewire DEVTrails
Dev.to

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations
Dev.to