AI Navigate

24GB VRAM users, have you tried Qwen3.5-9B-UD-Q8_K_XL?

Reddit r/LocalLLaMA / 3/21/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • The author reports that the 9B UD-Q8_K-XL variant delivers better quality and faster performance than the 27B Q4_K_XL and Q5_K_XL for non-coding tasks.
  • They were able to pair Qwen3-TTS with this model using a custom Scarlett Johansson voice, with notably fast responses after the initial prompt load.
  • In their tests, using the same context size for 27B and 9B, the 9B 8-bit quant appears to outperform the 27B's 4- or 5-bit quantization for general-purpose use.
  • They would consider adding a second GPU to run the 27B at 8-bit and asked others if they've seen similar results.

I am somewhat convinced by my own testing, that for non-coding, the 9B at UD-Q8_K-XL variant is better than the 27B Q4_K_XL & Q5_K_XL. To me, it felt like going to the highest quant really showed itself with good quality results and faster. Not only that, I am able to pair Qwen3-TTS with it and use a custom voice (I am using Scarlett Johansson's voice). Once the first prompt is loaded and voice is called, it is really fast. I was testing with the same context size for 27 and 9B.

This is mostly about how the quality of the higher end 9B 8-bit quant felt better for general purpose stuff, compared to the 4 or 5 bit quants of 27B. It makes me want to get another GPU to add to my 3090 so that i can run the 27B at 8 bit.

Has anyone seen anything similar.

submitted by /u/Prestigious-Use5483
[link] [comments]