New TTS Model: VoxCPM2

Reddit r/LocalLLaMA / 4/9/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • VoxCPM2 is a new text-to-speech (TTS) model that supports three speech-generation modes: Voice Design, Controllable Cloning, and Ultimate Cloning via audio continuation.
  • The project provides a live demo on Hugging Face (VoxCPM-Demo) and an official model page for VoxCPM2.
  • VoxCPM2 reports state-of-the-art or competitive performance across major zero-shot and controllable TTS benchmarks.
  • Benchmark results are referenced via the associated GitHub repository, including Seed-TTS-eval, CV3-eval, InstructTTSEval, and MiniMax Multilingual Test.

VoxCPM2 — Three Modes of Speech Generation:

🎨 Voice Design — Create a brand-new voice

🎛️ Controllable Cloning — Clone a voice with optional style guidance

🎙️ Ultimate Cloning — Reproduce every vocal nuance through audio continuation

Demo

https://huggingface.co/spaces/openbmb/VoxCPM-Demo

Performance

VoxCPM2 achieves state-of-the-art or competitive results on major zero-shot and controllable TTS benchmarks.

See the GitHub repo for full benchmark tables (Seed-TTS-eval, CV3-eval, InstructTTSEval, MiniMax Multilingual Test).

https://huggingface.co/openbmb/VoxCPM2

submitted by /u/foldl-li
[link] [comments]