Is there anyway to run bigger models at 20t/s with 24vram + 64gb ram DDR5?

Reddit r/LocalLLaMA / 4/25/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • The post asks whether it’s possible to run larger LLMs at high throughput (about 20 tokens/second) using a setup with 24GB VRAM and 64GB DDR5 RAM.
  • It references the current success of Qwen 27B for coding and speculates that an upcoming 122B model could be better.
  • The author expresses surprise at the strong performance of a “dense” model and mentions they have not used Codex for their C++ programming needs recently.
  • Overall, the content frames the question as a practical feasibility/performance discussion for local LLM deployment rather than a new product announcement.

I know the new Qwen 27B is amazing right now for coding in general, but since 122b is supposed to be coming as well, it’s expected to be better I guess ? I am actually surprised at how this dense model performs I haven’t used Codex at all anymore for all my C++ programming needs.

submitted by /u/soyalemujica
[link] [comments]