these last few months we got some excellent local models like
- Nemotron Nano 30BA3
- GLM 4.7 Flash
both of these were very good compared to anything that came before them with these two for the first time i was able to reliably do stuff(meaning i can look at a task and know yup these will be able to do it)
but then came Qwen 35B. it was smarter overall speeds don't degrade with larger context, and all the things that the other two struggle with Qwen 3.5B nailed it with ease (the task i am referring to here is related to i gave a very large homepage config with 100s of services split between 3 domains which are very similar and ask them to categorize all the services with machines the names were very confusing) i had to pullout oss120B to get that done
with more testing i found limitations of 35B not in any particular task but when you are vibe coding along after 80k context you ask the model to add a particular line of code the model adds it everything works but it added it at the wrong spot there are many little things that stack up. in this case when i looked at the instruction that i gave it wasn't clear and i didn't tell it where exactly i wanted the change (unfair comparison: but if i have given the same instruction to SOTA models they would have got it right every-time), they just know
this has been my experience so far.
given all that i wanted to ask you guys about your experience and do you think i would see a noticeable improvement with
| Model | Quantization | Speed (t/s) | Context Window | Vision Support | Prompt Processing |
|---|---|---|---|---|---|
| Qwen 3.5 35B | Q8 | 115 | 262k | Yes (mmproj) | 6000 t/s |
| Qwen 3.5 27B | Q8 | 28 | 262k | Yes (mmproj) | 2500 t/s |
| Qwen 3.5 122B | Q4_XS | 37 | 110k | No | 280-300 t/s |
| Qwen 3 Coder | mxfp4 | 120k | No | 95 t/s |
- qwen3.5 27B Q8
- Qwen3 coder next 80B MXFP4
- Qwen3.5 coder next 120B Q4_XS
if any of you have used these models extensively for agentic stuff or for coding how was your experience!! and do you think the quality benefit they provide outweighs the speed tradeoff.
would love to hear any other general advice or other model options you have tried and found useful.
Note: I have a rig with 48GB VRAM
[link] [comments]




