My Experience with Qwen 3.5 35B

Reddit r/LocalLLaMA / 3/20/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The author highlights Nemotron Nano 30BA3 and GLM 4.7 Flash as strong, reliable local models that made it possible to complete tasks with confidence for the first time.
Qwen 3.5 35B is described as smarter with stable speeds across larger contexts, capable of handling complex configurations that previously required much larger models (e.g., oss120B).
Despite improvements, 35B shows limitations in long-context coding scenarios, where it may insert code in the wrong position unless instructions are very clear about where to apply changes.
The article includes a comparative table (quantization, speed, context window, vision support, prompt processing) to illustrate tradeoffs among Qwen variants and related models.
The author seeks community input on whether the quality benefits of Qwen 3.5 35B justify the speed tradeoffs for agentic tasks and coding, and asks for other model options and experiences.

these last few months we got some excellent local models like

Nemotron Nano 30BA3
GLM 4.7 Flash

both of these were very good compared to anything that came before them with these two for the first time i was able to reliably do stuff(meaning i can look at a task and know yup these will be able to do it)

but then came Qwen 35B. it was smarter overall speeds don't degrade with larger context, and all the things that the other two struggle with Qwen 3.5B nailed it with ease (the task i am referring to here is related to i gave a very large homepage config with 100s of services split between 3 domains which are very similar and ask them to categorize all the services with machines the names were very confusing) i had to pullout oss120B to get that done

with more testing i found limitations of 35B not in any particular task but when you are vibe coding along after 80k context you ask the model to add a particular line of code the model adds it everything works but it added it at the wrong spot there are many little things that stack up. in this case when i looked at the instruction that i gave it wasn't clear and i didn't tell it where exactly i wanted the change (unfair comparison: but if i have given the same instruction to SOTA models they would have got it right every-time), they just know

this has been my experience so far.

given all that i wanted to ask you guys about your experience and do you think i would see a noticeable improvement with

Model	Quantization	Speed (t/s)	Context Window	Vision Support	Prompt Processing
Qwen 3.5 35B	Q8	115	262k	Yes (mmproj)	6000 t/s
Qwen 3.5 27B	Q8	28	262k	Yes (mmproj)	2500 t/s
Qwen 3.5 122B	Q4_XS	37	110k	No	280-300 t/s
Qwen 3 Coder	mxfp4	120k	No	95 t/s

qwen3.5 27B Q8
Qwen3 coder next 80B MXFP4
Qwen3.5 coder next 120B Q4_XS

if any of you have used these models extensively for agentic stuff or for coding how was your experience!! and do you think the quality benefit they provide outweighs the speed tradeoff.

would love to hear any other general advice or other model options you have tried and found useful.

Note: I have a rig with 48GB VRAM

submitted by /u/viperx7
[link] [comments]