What do you consider to be the minimum performance (t/s) for local Agent workflows?

Reddit r/LocalLLaMA / 4/25/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

Key Points

  • A Reddit user asks what minimum tokens-per-second (t/s) people will tolerate for local agent workflows.
  • They report running pi.dev connected to a llama.cpp instance with Qwen3.6-27B-Q6_K_L (200K context) on an RTX A6000 and getting about 26 t/s, finding it surprisingly usable.
  • The user says this performance feels similar to their Claude Code experience with Anthropic, though they’ve only tested with relatively simple prompts so far.
  • They are currently experimenting further with Brave Search API alongside the local agent setup.

What would you say is the minimum amount of tokens per second you would tolerate for your local agent workflows?

I have been trying pi.dev connected to a llama.cpp instance running Qwen3.6-27B-Q6_K_L with 200K context running on an RTX A6000. I get about 26 t/s and is surprisingly usable. About the same user experience I get with Claude Code connected to Anthropic. But I have just been fooling around with relative simple prompts so far. I'm trying out Brave search API.

submitted by /u/MexInAbu
[link] [comments]