Voxtral WebGPU: Real-time speech transcription entirely in your browser with Transformers.js

Reddit r/LocalLLaMA / 3/12/2026

📰 NewsTools & Practical UsageModels & Research

共有:

Key Points

Mistral released Voxtral-Mini-4B-Realtime, a multilingual speech transcription model supporting 13 languages with latency under 500 milliseconds.
Transformers.js now supports this model, enabling real-time speech transcription directly in the browser using WebGPU technology.
This approach allows live captioning to be performed entirely locally without server-side processing, enhancing privacy and reducing latency.
A demo and source code are available on Hugging Face Spaces for users to try out and integrate.
This development highlights advances in running efficient large models on client devices using modern web technologies like WebGPU.

Voxtral WebGPU: Real-time speech transcription entirely in your browser with Transformers.js

Mistral recently released Voxtral-Mini-4B-Realtime, a multilingual, realtime speech-transcription model that supports 13 languages and is capable of <500 ms latency. Today, we added support for it to Transformers.js, enabling live captioning entirely locally in the browser on WebGPU. Hope you like it!

Link to demo (+ source code): https://huggingface.co/spaces/mistralai/Voxtral-Realtime-WebGPU

submitted by /u/xenovatech
[link] [comments]