Tried to build a local voice cloning audiobook pipeline for Bulgarian — XTTS-v2 sounds Russian, Fish Speech 1.5 won't load on Windows. Anyone solved Cyrillic TTS locally?

Reddit r/LocalLLaMA / 3/21/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

Read original →

共有:

Key Points

The post documents attempting to build a local Bulgarian audiobook voice cloning pipeline using XTTS-v2 (Coqui TTS) and Fish Speech 1.5, highlighting practical installation and compatibility challenges.
XTTS-v2 does not officially support Bulgarian, and even when forcing language='ru', the output is Russian-accented and voice similarity remains poor for Bulgarian.
Fish Speech 1.5 shows promise with broad language coverage but suffers Windows model-loading issues, compounded by RTX 5070 Ti not being supported by stable PyTorch and recurring forced nightly builds.
The author concludes there is no good free local TTS solution for Bulgarian right now, with ElevenLabs being paid for longer texts, and asks for any working solutions.

Hi Everyone,

I just tried this with the help of Claude couse I am not so familiar with CMD and Powershell etc.

Tried to build a local Bulgarian audiobook voice cloner — here's what actually happened

Spent a full day trying to clone my voice locally and use it to read a book in Bulgarian. Here's the honest breakdown.

My setup: RTX 5070 Ti, 64GB RAM, Windows 11

Attempt 1: XTTS-v2 (Coqui TTS)

Looked promising — voice cloning from just 30 seconds of audio, runs locally, free. Got it installed after fighting some transformers version conflicts. Generated audio successfully.

Result: sounds Russian. Not even close to Bulgarian. XTTS-v2 officially supports 13 languages and Bulgarian isn't one of them. Using language="ru" is the community workaround but the output is clearly Russian-accented. Also the voice similarity to my actual voice was poor regardless of language.

Attempt 2: Fish Speech 1.5

More promising on paper — trained on 80+ languages including Cyrillic scripts, no language-specific preprocessing needed. Got it installed. Still working through some model loading issues on Windows.

What made everything harder than it should be:

The RTX 5070 Ti (Blackwell architecture) isn't supported by stable PyTorch yet. Had to use nightly builds. Every single package install would silently downgrade PyTorch back to 2.5.1, breaking GPU support. Had to force reinstall the nightly after almost every step.

Bottom line so far:

There is no good free local TTS solution with voice cloning for Bulgarian right now. ElevenLabs supports it natively but it's paid beyond 10k characters. If anyone has actually solved this I'd love to know.

I aprecciate every help or suggestion, what software I can use to create my own audiobooks with good sounding cloned voice.

I tried also Elevenlabs, but they want so much money for creating one small book, I cant imagine what 1 book of 1000 pages would cost.

Its all for own purpose use. Not selling or sharing.

Thanks a lot. x.o.x.o...

submitted by /u/Binqta
[link] [comments]

5 Dangerous Lies Behind Viral AI Coding Demos That Break in Production

Dev.to

How to Create a Month of Content in One Day Using AI (Step-by-Step System)

Dev.to

OpenTelemetry just standardized LLM tracing. Here's what it actually looks like in code.

Dev.to

🌱 How AI is Transforming Planting — and Why It Matters

Dev.to

What is MCP?

Dev.to

Tried to build a local voice cloning audiobook pipeline for Bulgarian — XTTS-v2 sounds Russian, Fish Speech 1.5 won't load on Windows. Anyone solved Cyrillic TTS locally?

Key Points

Related Articles

5 Dangerous Lies Behind Viral AI Coding Demos That Break in Production

How to Create a Month of Content in One Day Using AI (Step-by-Step System)

OpenTelemetry just standardized LLM tracing. Here's what it actually looks like in code.

🌱 How AI is Transforming Planting — and Why It Matters

What is MCP?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer