AI Navigate

Qwen3 TTS in C++ with 1.7B support, speaker encoding extraction, and desktop UI

Reddit r/LocalLLaMA / 3/15/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • A fork of Qwen3 TTS in C++ adds 1.7B model support, speaker encoding extraction, a JNI interface, and speaker instructions for custom voice models, including voice cloning for 0.6B and 1.7B bases.
  • A desktop application UI was built with Kotlin Multiplatform (qwen-tts-studio) to run and test TTS locally on Windows and Linux.
  • The project must be compiled from source and requires manual GGUF conversion for models, indicating a DIY workflow and setup steps.
  • The post presents the GitHub repos and a preview image, framing the work as a still-in-progress contribution shared for feedback.
Qwen3 TTS in C++ with 1.7B support, speaker encoding extraction, and desktop UI

I've spent the last few weekends working on a Qwen3 TTS implementation which is a fork of https://github.com/predict-woo/qwen3-tts.cpp but with more features and cleaner codebase: https://github.com/Danmoreng/qwen3-tts.cpp

It currently supports:

  • the 1.7B model
  • speaker encoding extraction
  • a JNI interface
  • speaker instructions (custom voice models)
  • voice cloning with both base models (0.6B and 1.7B)

I also built a desktop app UI for it using Kotlin Multiplatform:

https://github.com/Danmoreng/qwen-tts-studio

https://preview.redd.it/due94cp1m1pg1.png?width=2142&format=png&auto=webp&s=11ab89e23c842653c5ca0de383725008db271ec1

The app must be compiled from source, it works under Windows and Linux. Models still need to be converted to GGUF manually.

Both repos are missing a bit of polish. However, it is in a state that I feel comftable posting it here.

submitted by /u/Danmoreng
[link] [comments]