Small Gemma 4, Qwen 3.6 and Qwen 3 Coder Next comparison for a debugging use-case

Reddit r/LocalLLaMA / 4/19/2026

💬 OpinionTools & Practical UsageModels & Research

共有:

Key Points

The post compares three LLMs (Gemma 4, Qwen 3.6, and Qwen 3 Coder Next) using the same multi-turn debugging task: updating a legacy Flash-based website to work in modern browsers.
Gemma 4 and Qwen 3.6 both solved the initial problem in functionally similar ways and then provided useful follow-up feedback, while Qwen 3 Coder Next produced more convoluted suggestions.
All three models missed a remaining breaking issue after their proposed fixes, but Gemma 4 then delivered a simple, correct final fix whereas Qwen 3.6 offered a more convoluted approach and still felt less clean.
The author notes that the models were prompted via the completions API without an agentic harness or explicit basic CoT prompting, which was called out as a limitation for Qwen 3 Coder Next (non-reasoning behavior).

Nothing extensive to see here, just a quick qualitative and performance comparison for a single programming use-case: Making an ancient website that uses Flash for everything work with modern browsers. I let all 3 models tackle exactly the same issue and provided exactly the same multi-turn feedback.

Gemma 4 and Qwen 3.6 both nailed the first issue in a functionally equivalent way and provided useful additional feedback.
Q3CN went for a more convoluted fix.
All three missed a remaining breaking issue after the proposed fix.
Gemma 4 then made a simple, spot-on fix.
Qwen 3.6 solved it in a rather convoluted way that felt like it understood the issue less than Gemma 4, despite also pointing it out - yet less cleanly.
Q3CN proposed a very convoluted fix that missed the actual issue.

Note that all models were prompted directly via completions API, outside of an agentic harness. Thus Q3CN had the drawback of being a non-reasoning model and not being prompted for basic CoT.

	gemma-4-31B-it-UD-Q4_K_XL (18.8 GB)	Qwen3.6-35B-A3B-UD-Q5_K_XL (26.6 GB)	Qwen3-Coder-Next-UD-Q4_K_XL (49.6 GB)
Initial prompt tokens	60178	53063	50288
Prompt speed (tps)	642	2130	801
Total prompt time (s)	93	25	64
Generated tokens	1938	5437	1076
Response speed (tps)	13	66	40
Total response time (s)	151	82	27
Next turn	-	-	-
Generated tokens	4854	12027	1195
Response speed (tps)	12	59	34
Total response time (s)	396	204	35

Some observations:

Qwen 3.6 is the most verbose, also in reasoning, but it's still faster than Gemma 4 due to way higher TPS.
Qwen 3.6 clearly wins the prompt processing category.
Q3CN is faster despite way larger size due to way less verbosity - no reasoning, reduces capability.
In an agentic setting outside that test I found that Gemma 4 deals noticeably better with complex and conflicting information in coding and debugging scenarios. That might be due to dense vs. MoE.

All tests were with the latest llama.cpp, 24 GB VRAM with partial offload due to automated fitting and these options: -fa on --temp 0 -np 1 -c 80000 -ctv q8_0 -ctk q8_0 -b 2048 -ub 2048

(Yes, I'm aware that temp 0 isn't recommended, yet it currently works nicely for me)

submitted by /u/Chromix_
[link] [comments]

Black Hat USA

AI Business

Black Hat Asia

AI Business

How to Turn Any SaaS Into a Telegram Bot in 30 Minutes Using OpenClaw

Dev.to

Headless everything for personal AI

Simon Willison's Blog

Which model to summarize rss news articles

Reddit r/LocalLLaMA

Small Gemma 4, Qwen 3.6 and Qwen 3 Coder Next comparison for a debugging use-case

Key Points

Related Articles

Black Hat USA

Black Hat Asia

How to Turn Any SaaS Into a Telegram Bot in 30 Minutes Using OpenClaw

Headless everything for personal AI

Which model to summarize rss news articles

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer