As the title says, do you talk to your LLM using speech recognition and listen back its answers with TTS models?
Last night I didn't slept much so I sit on computer and installed Fast-Kokoro for TTS and configured Koboldcpp using Whisper model and so far it seems to be great experience with SillyTavern and Gemma 4 small E4B model.
I have RTX 4060 Ti with 16 GB VRAM and 32 GB of RAM and with this setup (SillyTavern + Koboldcpp + Whisper + Gemma 4-E4B + Fast Kokoro) it is almost real time, so it is relistic to use for talking with voice.
Since this is quite new to me (previously only used TTS long time ago for testing), I was wondering how others here are doing. Do you talk to your LLM's or is it more rare use case?
[link] [comments]




