OpenBMB just dropped VoxCPM2, the follow-up to their VoxCPM-0.5B. Big jump in scale and capabilities.
OpenBMB just released VoxCPM2, a significant step up from VoxCPM1.5.
VoxCPM1.5 → VoxCPM2:
| VoxCPM1.5 | VoxCPM2 |
|---|---|
| Params | 0.5B |
| Audio quality | 44.1kHz |
| Languages | Chinese + English |
| Training data | 1.8M hours |
| RTF (RTX 4090) | 0.17 |
| Voice Design | ❌ |
New in VoxCPM2:
- Voice Design — generate a novel voice from a text description alone, no reference audio needed
- Controllable Cloning — clone + steer emotion, pace, expression
- Ultimate Cloning — max fidelity with reference audio + transcript
- ~8GB VRAM, streaming support
HuggingFace: https://huggingface.co/openbmb/VoxCPM2
Anyone tested VoxCPM2 yet?
- vs Qwen3-TTS — naturalness and multilingual coverage?
- vs Open-MOSS — latency and voice quality?
- OmniVoice (k2-fsa) — covers 646 languages vs VoxCPM2's 30, RTF of 0.025 vs 0.30, but 24kHz vs 48kHz. Quality tradeoff worth it for the speed and language coverage?
- Does Voice Design (no reference audio) actually hold up?
- Non-English results?
Audio comparisons would be great if anyone has them.
[link] [comments]




