JaiTTS: A Thai Voice Cloning Model
arXiv cs.CL / 5/1/2026
📰 NewsModels & Research
Key Points
- JaiTTS-v1.0 is a Thai voice cloning text-to-speech model developed using continual training on a large Thai-focused speech corpus.
- Built on a tokenizer-free autoregressive TTS architecture adapted from VoxCPM, JaiTTS-v1.0 can handle numerals and Thai-English code-switching directly without explicit text normalization.
- The researchers evaluate both short- and long-duration speech generation to mirror realistic deployment scenarios.
- The model reports state-of-the-art performance with a CER of 1.94%, slightly outperforming the human ground truth (1.98%) on short-duration tasks and matching human-level results on long-duration tasks.
- In human preference tests, JaiTTS-v1.0 wins 283 out of 400 pairwise comparisons versus commercial flagship systems, with only 58 losses.
Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model
THE DECODER

Qualcomm teases ‘dedicated CPU for agentic experiences’ and ‘agentic smartphones’
The Register
Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats
Reddit r/LocalLLaMA
![Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]](/_next/image?url=https%3A%2F%2Fpreview.redd.it%2Fvutakjb0vgyg1.png%3Fwidth%3D140%26height%3D59%26auto%3Dwebp%26s%3D08ecb95fd65ade25c924988f1992e9abe3d79f62&w=3840&q=75)
Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]
Reddit r/MachineLearning