AI Navigate

Quick thoughts on Qwen3.5-35B-A3B-UD-IQ4_XS from Unsloth

Reddit r/LocalLLaMA / 3/20/2026

💬 OpinionTools & Practical Usage

Key Points

  • The author tested Qwen3.5-35B-A3B-UD-IQ4_XS in the new Oobabooga interface and reports around 100 t/s on a 3090 with almost no preprocessing.
  • It can fit around a 250k context length on the GPU without cache quantization, enabling long-horizon generation.
  • The model sometimes makes mistakes but generally follows directions well and delivers a working product, indicating strong competence for goal-driven tasks.
  • The author anticipates it pairing well with agentic workflows and local tooling (e.g., Cline) due to its speed and large context.
  • A quick CodePen demo (3D Snake) is used to illustrate practical demos with the model.

Just some quick thoughts on Qwen3.5-35B-A3B-UD-IQ4_XS after I finally got it working in the new version of Ooba. In short: on a 3090, this thing runs at around 100 t/s with almost no preprocessing time, and it can fit like a 250k context length on the card with no cache quantization. Actual performance is quite good. I always make a quick demo and chuck it on Codepen, and I've been trying and failing to make a basic 3D snake game in ThreeJS with a local model until now.

3D Snake

This sort of thing should be easy, but lots of models refused to make changes without breaking the entire thing, even if I tried reprompting them with a fresh context and as many pointers as I could easily provide. This model was different, though. It made a few mistakes, and it had to spend a while thinking at times, but it actually fixed shit and delivered a working product. I think the best you can hope for with a tiny model is strong competence at following directions and properly executing on a fairly well-defined goal, and this model seems to do that well. I have yet to try it out with Cline, but I suspect it will do fairly well in a proper agentic workflow. Cline is sort of a menace when it comes to hogging context, so I suspect it will be a good pairing with a local model that is competent, really fast, and can fit a huge unquantized context on the GPU.

submitted by /u/EuphoricPenguin22
[link] [comments]