GLM 4.7 on dual RTX Pro 6000 Blackwell

Reddit r/LocalLLaMA / 3/16/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

A Reddit post investigates whether the full 358B GLM 4.7 model can fit entirely into 192GB VRAM on dual RTX Pro 6000 Blackwell, and asks about feasible quantization (including NVFP4) with a batch size of 1 and input length under 4096 tokens.
The author notes that online VRAM calculators may be conservative and seeks real-world results rather than theoretical estimates.
If the 192GB setup cannot fit the model, the post asks for alternative model recommendations suitable for the same hardware and use case (roleplay and general tool calling with RAG).
The thread provides a link to the Reddit post by user mircM52 for discussion and comments.

Has anyone gotten this model (the full 358B version) to fit entirely into 192GB VRAM? If so, what's the highest quant (does NVFP4 fit)? Batch size 1, input sequence <4096 tokens. The theoretical calculators online say it just barely doesn't fit, but I think these tend to be conservative so I wanted to know if anyone actually got this working in practice.

If it doesn't fit, does anyone have other model recommendations for this setup? Primary use case is roleplay (nothing NSFW) and general assistance (basic tool calling and RAG).

Apologies if this has been asked before, I can't seem to find it! And thanks in advance!

submitted by /u/mircM52
[link] [comments]

I Built an AI That Audits Other AI Agents for Token Waste — Launching on Product Hunt Today

Dev.to

Check out this article on AI-Driven Reporting 2.0: From Manual Bottlenecks to Real-Time Decision Intelligence (2026 Edition)

Dev.to

SYNCAI

Dev.to

How AI-Powered Decision Making is Reshaping Enterprise Strategy in 2024

Dev.to

When AI Grows Up: Identity, Memory, and What Persists Across Versions

Dev.to

GLM 4.7 on dual RTX Pro 6000 Blackwell

Key Points

Related Articles

I Built an AI That Audits Other AI Agents for Token Waste — Launching on Product Hunt Today

Check out this article on AI-Driven Reporting 2.0: From Manual Bottlenecks to Real-Time Decision Intelligence (2026 Edition)

SYNCAI

How AI-Powered Decision Making is Reshaping Enterprise Strategy in 2024

When AI Grows Up: Identity, Memory, and What Persists Across Versions

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer