96GB Vram. What to run in 2026?

Reddit r/LocalLLaMA / 4/10/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

共有:

Key Points

The post discusses whether 96GB of VRAM is an awkward middle ground in 2026 for local LLM use, being insufficient for larger models yet more than needed for smaller mid-sized options.
The author previously planned a multi-GPU setup using four RTX 3090s, but is reconsidering due to newer model releases such as Qwen 3.5 and Gemma 4.
The question asks community members what they are running as their main local model given the constraints and tradeoffs of 96GB VRAM.
Implicitly, it highlights ongoing concerns about selecting an optimal model size and configuration for local inference hardware as model capabilities outpace VRAM budgets.

I was all set on doing the 4x 3090 route but with the current releases of qwen 3.5 and gemma 4. I am having second doubts. 96gb of vram seems to be in a weird spot where it not enough to run larger models and more than needed for the mid models. What are you running as your main model?

submitted by /u/inthesearchof
[link] [comments]