AI Navigate

Qwen 3.5 do I go dense or go bigger MoE?

Reddit r/LocalLLaMA / 3/18/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

Key Points

  • The discussion centers on whether to scale AI models by going denser with MoE up to around 120B versus pursuing faster inference by upgrading memory bandwidth or VRAM.
  • The author currently runs Qwen 3.5 35B-a3b and 27B variants on a dual AMD 7900XT setup with roughly 40 GB VRAM, but finds the performance slower than desired for day-to-day coding tasks.
  • Upgrade options include a memory-over-bandwidth path (dual AMD 9700 AI Pro with 64 GB VRAM and 640 GB/s) to support very large MoE models, or a bandwidth-over-memory path (a single RTX5090 with ~1800 GB/s) to speed up the 27B model.
  • They are seeking practical advice on which path provides better real-world gains for their workload, weighing dense MoE scaling against faster, more compact models.

I have a workstation with dual AMAd 7900XT, so 40gb VRAM at 800gb/s it runs the likes of qwen3.5 35b-a3b, a 3-bit version of qwen-coder-next and qwen3.5 27b, slowly.

I love 27b it’s almost good enough to replace a subscription for day to day coding for me (the things I code are valuable to me but not extremely complex). The speed isn’t amazing though… I am of two minds here I could either go bigger, reach for the 122b qwen (and the nvidia and mistral models…) or I could try to speed up the 27b, my upgrade paths:

Memory over bandwidth: dual AMD 9700 ai pro, 64gb vram and 640 GB/s bandwidth. Great for 3-bit version of those ~120b MoE models

Bandwidth over memory: a single RTX5090 with 1800gb/s bandwidth, which would mean fast qwen3.5 27b

Any advice?

submitted by /u/Alarming-Ad8154
[link] [comments]