I recently worked on BlueTTS, a lightweight text-to-speech model that focuses on speed and usability.
It supports multiple languages: English, Hebrew, Russian, Spanish, and French (even within the same sentence), and comes with a large set of voices available out of the box.
The model reaches up to 1500× real-time on GPU and runs in real-time on CPU, while staying small enough (~80MB) to run on almost any machine.
Everything is fully open-source, including the training pipeline :)
Contributions are welcome, for example adding support in llama.cpp.
You can check it out here:
[link] [comments]



