Hi Everyone,
I just tried this with the help of Claude couse I am not so familiar with CMD and Powershell etc.
Tried to build a local Bulgarian audiobook voice cloner — here's what actually happened
Spent a full day trying to clone my voice locally and use it to read a book in Bulgarian. Here's the honest breakdown.
My setup: RTX 5070 Ti, 64GB RAM, Windows 11
Attempt 1: XTTS-v2 (Coqui TTS)
Looked promising — voice cloning from just 30 seconds of audio, runs locally, free. Got it installed after fighting some transformers version conflicts. Generated audio successfully.
Result: sounds Russian. Not even close to Bulgarian. XTTS-v2 officially supports 13 languages and Bulgarian isn't one of them. Using language="ru" is the community workaround but the output is clearly Russian-accented. Also the voice similarity to my actual voice was poor regardless of language.
Attempt 2: Fish Speech 1.5
More promising on paper — trained on 80+ languages including Cyrillic scripts, no language-specific preprocessing needed. Got it installed. Still working through some model loading issues on Windows.
What made everything harder than it should be:
The RTX 5070 Ti (Blackwell architecture) isn't supported by stable PyTorch yet. Had to use nightly builds. Every single package install would silently downgrade PyTorch back to 2.5.1, breaking GPU support. Had to force reinstall the nightly after almost every step.
Bottom line so far:
There is no good free local TTS solution with voice cloning for Bulgarian right now. ElevenLabs supports it natively but it's paid beyond 10k characters. If anyone has actually solved this I'd love to know.
I aprecciate every help or suggestion, what software I can use to create my own audiobooks with good sounding cloned voice.
I tried also Elevenlabs, but they want so much money for creating one small book, I cant imagine what 1 book of 1000 pages would cost.
Its all for own purpose use. Not selling or sharing.
Thanks a lot. x.o.x.o...
[link] [comments]


