Has anyone gotten this model (the full 358B version) to fit entirely into 192GB VRAM? If so, what's the highest quant (does NVFP4 fit)? Batch size 1, input sequence <4096 tokens. The theoretical calculators online say it just barely doesn't fit, but I think these tend to be conservative so I wanted to know if anyone actually got this working in practice.
If it doesn't fit, does anyone have other model recommendations for this setup? Primary use case is roleplay (nothing NSFW) and general assistance (basic tool calling and RAG).
Apologies if this has been asked before, I can't seem to find it! And thanks in advance!
[link] [comments]

