Tried Qwen3.6-27B-UD-Q6_K_XL.gguf with CloudeCode, well I can't believe but it is usable

Reddit r/LocalLLaMA / 4/23/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • A user reports running Qwen3.6-27B-UD-Q6_K_XL in GGUF format with a 200K context using llama.cpp on an RTX 5090, achieving about 50 tokens per second.
  • They are surprised that local coding with this model is now “usable,” even if it’s not at the same polished experience as a top-tier proprietary system.
  • In an initial test of a fairly difficult planning/task setup (not simple CRUD), the model produced a sensible plan on the first attempt.
  • The user notes this impression is preliminary and not a full day-to-day evaluation, but they find it more promising than earlier local coding experiences with other models.
Tried Qwen3.6-27B-UD-Q6_K_XL.gguf with CloudeCode, well I can't believe but it is usable

So I tried to run Qwen3-27B-UD-Q6_K_XL.gguf with 200K context on my RTX 5090 using llama.cpp. I'm getting around 50 tok/s, which is fine I guess, I don't really know this stuff so it might be improvable. But what I want to say is, I haven't tried local models for coding for quite a long time, and hell, I can't believe we're at the point where it's actually usable? Of course not the same first class experience as Opus 4.7, but damn, we are getting closer and closer.

https://preview.redd.it/3pbvuks69twg1.png?width=2556&format=png&auto=webp&s=0ed498974c33bd33d807bf1b91e310c346f1e69c

Tried quite a difficult task, not casual CRUD stuff, to see if it can even try to prepare a plan that is somewhat making sense, and it did very well on the first try.

Of course that's just a general first impression and I haven't done real day to day coding with it, but at least I like what I see and it looks much more promising than my earlier experience with other models, which could start doing total nonsense at some points.

submitted by /u/Clasyc
[link] [comments]