In my current workflow (coding in python/c++ and technical reports) I mostly use Qwen3.6 27B and Gemma4 31B. In the past I tried other models like Deepseek with decent results but was painfully slow.... so do you think there is some model that I'm missing and should try?
EDIT: to be clear, I'm not asking how to make those models run faster, I'm asking which other models I should try. Telling me to try them all doesn't help, first because there are a bazillion models available and nobody on earth could reasonably try them all, and second if I were willing to try them all I wouldn't have asked here. If I see the model using more VRAM than avalilable I already scale down, either on the quantization or on the model itself if possible, or I abandon the model because it's too slow.
System specs: MI50 32GB + V100 32GB. And going below 10tps on real world usage is "painfully slow".
[link] [comments]

