Local voice cloning with expression system

Reddit r/LocalLLaMA / 3/30/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

Key Points

  • The post asks whether there are local (on-device) voice-cloning models that can also generate expressive emotions, and run well on a GPU with 8GB VRAM (e.g., RTX 4060).
  • It focuses on practical feasibility for hobbyist or privacy-preserving setups, emphasizing both voice cloning capability and controllable expression.
  • The request is framed as a recommendation-seeking question rather than a report of a new release or breakthrough.

is there any local models that can voice clone, but also supports some sort of expression\emotions on gpu /w 8gb (rtx 4060)?

submitted by /u/Sea-Vehicle8208
[link] [comments]