| The following is a non-comprehensive test I came up with to test the quality difference (a.k.a degradation) between different quantizations of Qwen 3.6 27B. I want to figure out what's the best quant to run on my 16 GB VRAM setup. WHAT WE ARE TESTING First, the prompt: I want to see if the models can:
And yes, if you are questioning. It could be possible that the model was trained to do the same thing on existing chess games, so I came up with some random moves, the kind of moves that no players above 300 elo would ever have played. For those who are not chess players, this is how the board supposed to look like after move 7. h4. Btw, you supposed to look at the pieces positions and the board orientation, not image quality because this is just a screenshot from Lichess. CAN OTHER MODELS SOLVE IT? Before we go to the main part, let me show the result from some other models. I find it interesting that not many models were able to figure out the board state, let alone rendering it correctly. Qwen 3.5 27B It was mostly figured out the final position of the pieces, but still render the original board state on top. Highlighted the wrong squares, and the board orientation is wrong. Gemma 4 31B Nice chess dot com flagship board style, i would say it can figure out the board state, but failed to render it correctly. The square pattern also messed up. Qwen3 Coder Next I don't know what to say, quite disappointed. Qwen3.6 35B A3B As expected, 35B always be the fastest Qwen model, but at the same time, managed to fail the task successfully in many different ways. This is why I decided to find a way to squeeze 27B into my 16 GB card. The speed alone just not worth it. HOW QWEN3.6 27B SOLVE IT? All the models here are tested with the same set of llama.cpp parameters:
BF16 version was from OpenRouter, Q8 to Q4_K_XL versions was on a L40S server, the rest are on my RTX 5060 Ti. The SVG code generated directly on Llama.cpp Web UI without any tools or MCP enabled (I originally ran this test in Pi agent, only to found out that the model tried to peek into the parent folders and found the existing SVG diagrams by higher quants, copied most of it). BF16 - Full precision This is the baseline of this test. It has everything I needed: right position, right board orientation, right piece colors, right highlight. The dotted blue line was unexpected, but it also interesting, because later on you will see, not many of the high quants generate this. Q8_0 As expected Q8 retains pretty much everything from the full precision except the line. Q6_K We start to see some quality loss here. I mean the placement of the rank 5 pawns. The look of the pieces are mostly because Q6 decided to use a different font. None of the models here trying to draw its own pieces in this test. Q5_K_XL Looks very similar with Q8, but it is worth noticing that the SVG code of Q5 version is 7.1 KB, while Q8 is 4.7 KB. Q4_K_XL and IQ4_XS If you ignore the font choice, you will see Q4_K_XL is a more complete solution, because it has the board coordinates. Q3_K_XL and Q3_K_M IQ3_XXS Now here's the interesting part, everything was mostly correct, the piece placements and the highlight, and there's the line on the last move! But IQ3_XXS get the board orientation wrong, see the light square on the bottom left? Q2_K_XL This is just a waste of time. But hey, it got all the pieces positions right. The board is just not aligned at all. SO, WHAT DO I USE? I know a single test is not enough to draw any conclusion here. But personally, I will never go for anything below IQ4_XS after this test (I had bad experience with Q3_K_XL and below in other tries). On my RTX 5060 Ti, I got like pp 100 tps and tg 8 tps for IQ4_XS with vanilla llama.cpp (q8 for both ctk and ctv, fit on). But with TheTom's TurboQuant fork, I managed to get up to pp 760 tps and tg 22 tps, by forcing GPU offload for all layers (`-ngl 99`), quite usable. The only down side is I have to keep the context window below 75k, and use turbo4/turbo2 for KV cache quant. Below are some example of different KV cache quants. You can see all the result directly here https://qwen3-6-27b-benchmark.vercel.app/ [link] [comments] |
Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...)
Reddit r/LocalLLaMA / 5/6/2026
💬 OpinionTools & Practical UsageModels & Research
Key Points
- The post compares different quantization levels of the Qwen 3.6 27B model (e.g., BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS) to see how much quality degrades under lower-precision formats.
- The author uses a chess-specific prompt based on a PGN game with unusual moves to test whether models can accurately track board state across moves and render the correct SVG with the last move highlighted.
- The evaluation emphasizes correctness of piece placement and board orientation rather than the aesthetic quality of the generated image, using a reference screenshot from Lichess.
- Early observations suggest many models struggle to infer the correct board state and produce a properly rendered SVG, motivating the search for the best quantization that still works on a 16GB VRAM setup.
- Additional model results (e.g., Qwen 3.5 27B and Gemma 4 31B) are shown to illustrate common failure modes, such as keeping the wrong original board state or highlighting incorrect squares.
Related Articles

Black Hat USA
AI Business

Top 10 Free AI Tools for Students in 2026: The Ultimate Study Guide
Dev.to

PaioClaw Review: What You Actually Get for $15/mo vs DIY OpenClaw
Dev.to

PaioClaw Review: What You Actually Get for $15/mo vs DIY OpenClaw
Dev.to

SIFS (SIFS Is Fast Search) - local code search for coding agents
Dev.to