Local ai that feels as fast as frontier.

Reddit r/LocalLLaMA / 3/30/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage

共有:

Key Points

The post argues that local LLM chat experiences can feel much faster by adopting “duplex” interaction patterns that respond immediately after partial input rather than waiting for full completion.
It cites Nvidia’s PersonaPlex voice model as an example of full-duplex behavior (listening while the user speaks and replying right after), and proposes applying a similar idea to text via streaming.
The author claims that although duplex/text-streaming may not reduce the model’s actual compute time, it improves “perceived speed,” making a local LLM feel closer to fast API-based frontier models.
The author shares a specific open-source project (“duplex-chat”) and notes personal testing on a local setup using MLX with Qwen 3.5 32B (a3b), encouraging feedback on the approach.
The post highlights the difficulty of benchmarking perceived responsiveness versus real latency, suggesting evaluation should account for user experience rather than only end-to-end timing.

A thought occured to me a little bit ago when I was installing a voice model for my local AI. The model i chose was personaplex a model made by Nvidia which featured full duplex interactions. What that means is it listens while you speak and then replies the second you are done. The user experience was infinitely better than a normal STT model.

So why dont we do this with text? it takes me a good 20 seconds to type my local assistant the message and then it begins processing then it replies. that is all time we could absolrb by using text streaming. NGL the benchmarking on this is hard as it doesnt actually improve speed it improves perceived speed. but it does make a locall llm seem like its replying nearly as fast as api based forntier models. let me know what you guys think. I use it on MLX Qwen 3.5 32b a3b.

https://github.com/Achilles1089/duplex-chat

submitted by /u/habachilles
[link] [comments]