AI Navigate

インサイトインサイト最新記事最新記事一覧 AI大全AI大全カオスマップAIカオスマップ

GLM 5.1 Locally: 40tps, 2000+ pp/s

Reddit r/LocalLLaMA / 4/26/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Read original →

共有:

Key Points

The author reports successfully running a GLM 5.1 reap-ed nvfp4 version locally and getting stable, fast inference on four RTX 6000 Pro GPUs limited to 350W.
Reported throughput is analyzed by context length, with prefill tokens per second (PP@4096) decreasing as context depth increases, while generation throughput (TG@512) remains relatively steady but also declines at longer contexts.
Peak burst generation throughput is around the low-40 tps range (about 43 tg peak), and overall performance with opencode is described as close to Sonnet + Claude Code.
The setup is said to handle 100–200k sessions stably, with the author planning to test different concurrency settings and noting that concurrency=2 yields ~65 tps on average during generation.
The post invites others to share whether they have achieved better performance on the same hardware.

After some sglang patching and countless experiments, managed to get reap-ed nvfp4 version running stable and FAST on 4 x RTX 6000 Pros (limited to 350W). Very happy with performance and quality. Inference software is still under-optimized for those cards. I think we will see their true potential unfold this or early next year.

Throughput by Context Depth

Prefilled	PP@4096	TG@512
0	2229.0	42.03
4K	1943.6	41.41
16K	1558.9	39.72
32K	1234.2	38.19
64K	863.5	35.87

TG Peak (burst throughput)

43.00 42.00 40.00 39.00 37.00

Overall experience with opencode is pretty close to Sonnet + Claude Code. 100-200k sessions are stable.

Will play with different concurrency settings this weekend.

Anyone seen better performance on this hardware?

PS: concurrency = 2 worked great. Generation hits 65 tps average.

submitted by /u/val_in_tech
[link] [comments]

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/26DailyView insight →

Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

How I tracked which AI bots actually crawl my site

How I tracked which AI bots actually crawl my site

Dev.to

Hijacking OpenClaw with Claude

Hijacking OpenClaw with Claude

Dev.to

How I Replaced WordPress, Shopify, and Mailchimp with Cloudflare Workers

How I Replaced WordPress, Shopify, and Mailchimp with Cloudflare Workers

Dev.to

Anthropic created a test marketplace for agent-on-agent commerce

Anthropic created a test marketplace for agent-on-agent commerce

TechCrunch

関連おすすめサービス

※当サイトはアフィリエイト広告を利用しています

Notta搭載AI議事録イヤホン ZENCHORD1

AI時代の仕事術。Notta搭載で会議の議事録を自動生成するスマートイヤホン。

AI搭載ボイスレコーダー Plaud

世界100万人が愛用。AIで文字起こし・要約を自動化するボイスレコーダー。

画像高画質化AIツール Aiarty Image Enhancer

AIで画像を高画質化。写真・イラストを簡単にアップスケール。