96GB Vram. What to run in 2026?

Reddit r/LocalLLaMA / 4/10/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

Key Points

  • The post discusses whether 96GB of VRAM is an awkward middle ground in 2026 for local LLM use, being insufficient for larger models yet more than needed for smaller mid-sized options.
  • The author previously planned a multi-GPU setup using four RTX 3090s, but is reconsidering due to newer model releases such as Qwen 3.5 and Gemma 4.
  • The question asks community members what they are running as their main local model given the constraints and tradeoffs of 96GB VRAM.
  • Implicitly, it highlights ongoing concerns about selecting an optimal model size and configuration for local inference hardware as model capabilities outpace VRAM budgets.

I was all set on doing the 4x 3090 route but with the current releases of qwen 3.5 and gemma 4. I am having second doubts. 96gb of vram seems to be in a weird spot where it not enough to run larger models and more than needed for the mid models. What are you running as your main model?

submitted by /u/inthesearchof
[link] [comments]