Is there anyway to run bigger models at 20t/s with 24vram + 64gb ram DDR5?

Reddit r/LocalLLaMA / 4/25/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

共有:

Key Points

The post asks whether it’s possible to run larger LLMs at high throughput (about 20 tokens/second) using a setup with 24GB VRAM and 64GB DDR5 RAM.
It references the current success of Qwen 27B for coding and speculates that an upcoming 122B model could be better.
The author expresses surprise at the strong performance of a “dense” model and mentions they have not used Codex for their C++ programming needs recently.
Overall, the content frames the question as a practical feasibility/performance discussion for local LLM deployment rather than a new product announcement.

I know the new Qwen 27B is amazing right now for coding in general, but since 122b is supposed to be coming as well, it’s expected to be better I guess ? I am actually surprised at how this dense model performs I haven’t used Codex at all anymore for all my C++ programming needs.

submitted by /u/soyalemujica
[link] [comments]

Black Hat USA

AI Business

Just what the doctor ordered: how AI could help China bridge the medical resources gap

SCMP Tech

Got into the Anthropic Claude Partner Network — have spots for people who want CCAF cert access

Reddit r/artificial

💎 Daily B2B Lead Report: Who's Hiring Now? (2026-04-25)

Dev.to

Automating Advanced Customization in Your Music Studio

Dev.to

Is there anyway to run bigger models at 20t/s with 24vram + 64gb ram DDR5?

Key Points

Related Articles

Black Hat USA

Just what the doctor ordered: how AI could help China bridge the medical resources gap

Got into the Anthropic Claude Partner Network — have spots for people who want CCAF cert access

💎 Daily B2B Lead Report: Who's Hiring Now? (2026-04-25)

Automating Advanced Customization in Your Music Studio

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer