Trying pipeline
VAD speech chunk > LLM > TTS
skipping ASR part completely
but audio just refuses to work
tried multiple llama.cpp builds and unsloth studio
no luck so far
only thing that works is LiteRT LM by google
but it forces cpu only inference when audio is involved
and it kills performance
saw on Github that gpu implementation is still pending
any workaround or different stack that actually works ???
[link] [comments]



