Qwen3 TTS in C++ with 1.7B support, speaker encoding extraction, and desktop UI

Reddit r/LocalLLaMA / 3/15/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

A fork of Qwen3 TTS in C++ adds 1.7B model support, speaker encoding extraction, a JNI interface, and speaker instructions for custom voice models, including voice cloning for 0.6B and 1.7B bases.
A desktop application UI was built with Kotlin Multiplatform (qwen-tts-studio) to run and test TTS locally on Windows and Linux.
The project must be compiled from source and requires manual GGUF conversion for models, indicating a DIY workflow and setup steps.
The post presents the GitHub repos and a preview image, framing the work as a still-in-progress contribution shared for feedback.

I've spent the last few weekends working on a Qwen3 TTS implementation which is a fork of https://github.com/predict-woo/qwen3-tts.cpp but with more features and cleaner codebase: https://github.com/Danmoreng/qwen3-tts.cpp

It currently supports:

I also built a desktop app UI for it using Kotlin Multiplatform:

The app must be compiled from source, it works under Windows and Linux. Models still need to be converted to GGUF manually.

Both repos are missing a bit of polish. However, it is in a state that I feel comftable posting it here.

Dev.to

Dev.to

Dev.to

Dev.to

Dev.to