I tested Qwen3.6-27B, Qwen3.6-35B-A3B, Qwen3.5-27B and Gemma 4 on the same real architecture-writing task on an RTX 5090

Reddit r/LocalLLaMA / 4/23/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The author ran a local, side-by-side evaluation of four LLMs (Gemma 4, Qwen3.6-27B, Qwen3.6-35B-A3B, and Qwen3.5-27B) on the same architecture-document writing task using an RTX 5090.
The test involved generating a unified Masterplan.md from two blueprint documents (a ~16k-token V1 and a ~4.6k-token V2, combined ~20.6k tokens) and assessing outputs across clarity, completeness, discipline, and usefulness.
For clarity and discipline, Gemma 4 scored highest, producing the cleanest structure, best pacing, and strongest restraint in preserving the product identity.
For completeness, Qwen3.6-35B-A3B substantially led, generating the most exhaustive architecture document and the greatest “implementation mass.”
The workflow used a consistent Hermes-based writing agent (“Scribe”) with a second GPT-5.4 agent (“Manny”) to direct and review each stage (initial draft, revision, and final polish) for a more controlled comparison.

I ran a pretty simple but revealing local-LLM test.

At first I was only going to post about the two Qwens and Gemma4 and go to bed, and what do you know, I go on reddit and see a post that Qwen 3.6-27B dropped. Oh well...

Models tested:

Gemma4
- cyankiwi/gemma-4-31B-it-AWQ-4bit
Qwen3.6-35B
- RedHatAI/Qwen3.6-35B-A3B-NVFP4
Qwen3.5-27B
- QuantTrio/Qwen3.5-27B-AWQ
Qwen3.6-27B
- cyankiwi/Qwen3.6-27B-AWQ-INT4

Context: I’m working on fairly complex tool that takes noisy evidence and turns it into a structured “truth report.”

I gave the same Hermes writing agent (“Scribe”) the same task:

take 2 architecture blueprint docs (v1 baseline + v2 expansion) describing the "truth engine" and produce a unified `Masterplan.md` explaining:

- what the product is

- the user problem

- UX/product shape

- UVP/moat

- pipeline

- agent roles

- architecture

- trust/legal/provenance posture

- what changed between plan V1 and V2

V1: ~16k tokens,

V2: ~4.6k tokens,

Combined: ~20.6k tokens

Then I ran the full workflow locally on my RTX 5090 all 4 models:

- **Gemma4**
- **Qwen3.6-35B**
- **Qwen3.5-27B**
- **Qwen3.6-27B**

To make it fair and push the models, each model got:

initial draft
second-pass revision
final polish

Each stage was directed and reviewed by my GPT-5.4 agent Manny, so this wasn’t just “ask once and compare vibes.”

## What I/Manny scored

- **Clarity**

- **Completeness**

- **Discipline**

- **Usefulness**

## Final results

### Clarity

- Gemma4: **9.4**

- Qwen3.6-27B: **8.8**

- Qwen3.6-35B: **8.1**

- Qwen3.5-27B: **7.4**

**Winner: Gemma4** (at a cost, read further below)

Gemma was the best editor. Cleanest structure, best pacing, strongest restraint.

---

### Completeness

- Qwen3.6-35B: **9.6**

- Qwen3.5-27B: **9.1**

- Qwen3.6-27B: **8.7**

- Gemma4: **7.9**

**Winner: Qwen3.6-35B**

The 35B Qwen wrote the most exhaustive architecture doc by far. Best sourcebook, most implementation mass.

---

### Discipline

- Gemma4: **9.5**

- Qwen3.6-27B: **8.6**

- Qwen3.6-35B: **7.7**

- Qwen3.5-27B: **6.8**

**Winner: Gemma4**

Gemma best preserved the actual product identity

---

### Usefulness

- Qwen3.6-27B: **9.3**

- Qwen3.6-35B: **9.2**

- Gemma4: **8.9**

- Qwen3.5-27B: **8.8**

**Winner: Qwen3.6-27B**

This was the surprise. The 27B Qwen 3.6 ended up as the best **overall practical workhorse** — better balance of depth, readability, and usability than the others.

## Final ranking

1. **Qwen3.6-27B** — best all-around balance

**Gemma4** — best editor / strategist
**Qwen3.6-35B** — best exhaustive drafter
**Qwen3.5-27B** — solid, but clearly behind the others for this task

1) Best overall balance

Qwen3.6-27B This is the new interesting winner.

It doesn’t beat Gemma4 on clarity or discipline.
It doesn’t beat Qwen3.6-35B on completeness.

But it wins the thing that matters most for a real working master plan: balance. It’s the best compromise between:

readability
completeness
structure
practical usefulness

2) Best editor / best strategist

Gemma4 If the goal is:

cleanest finished document
strongest executive readability
best restraint
best “this feels like a real deliberate plan”

Then Gemma still wins.

3) Best exhaustive architecture quarry

Qwen3.6-35B If the goal is:

maximum implementation mass
biggest architecture sourcebook
richest mining material for downstream docs

Then Qwen3.6-35B is still the beast.

4) Fourth place

Qwen3.5-27B Not bad. Not embarrassing.
But now clearly behind both Qwen3.6 variants and Gemma for this kind of long-form architecture/planning task.

## Actual takeaway

This ended up being a really clean split:

- **Gemma4 = best editor**

- **Qwen3.6-35B = best expander**

- **Qwen3.6-27B = best practical default**

- **Qwen3.5-27B = respectable, but not the winner**

So if I were setting a default local writing worker for long-form architecture/master-plan work today, I’d probably choose:

**Qwen3.6-27B*\*

It’s the best compromise between:

- readability

- completeness

- structure

- practical usefulness

Personal Note re Gemma 4: It was drastically shorter than the Qwens for the final output

Gemma4 → 147 lines
Qwen3.6-35B → 725 lines
Qwen3.5-27B → 840 lines
Qwen3.6-27B → 555 lines

So while I do agree that less is often more, I found the Gemma4 output lacking in both technical depth and detail. Sure, it captured the core concepts, but I would position the output as more of a pitching deck or high level concept, technical details and concepts however are sorely missing.
On the other end of the spectrum is Qwen3.6-35B which delivered 5x the volume. That document could really serve as a technical blueprint and architecture implementation bible. Qwen3.5-27B produced even more but this was quantity over quality.
I would honestly have rated Gemma4 less favourably than Manny did, so make of that what you will.

For First-draft only performance, I’d rank them:

One-shot ranking

Qwen3.6-27B
Qwen3.6-35B
Qwen3.5-27B
Gemma4

Why

1) Qwen3.6-27B

Best balance right out of the gate:

strong product framing
solid structure
good density
less bloated than the other Qwens
more complete than Gemma’s first draft

This was the best raw first shot.

2) Qwen3.6-35B

Very strong one-shot draft, but more sprawling:

most exhaustive
richest implementation mass
more likely to over-include
better sourcebook than polished masterplan on first pass

If you want maximum raw material, this one was a beast.

3) Qwen3.5-27B

Good first-draft generator, but sloppier:

ambitious
broad
lots of content
weaker discipline and coherence than the 3.6 models

Still useful, but clearly behind both 3.6 variants.

4) Gemma4

Gemma (arguably) won the final polished-document contest, but not the first-draft contest. Its one-shot behaviour was:

too compressed
too selective
not thorough enough for the initial task

It needed the later revision passes to get more substance. Depending on the audience, this may be either good or bad.

Short version

Best one-shot: Qwen3.6-27B
Best after revision/polish: Gemma4

submitted by /u/Gazorpazorp1
[link] [comments]

Black Hat USA

AI Business

Add cryptographic authorization to AI agents in 5 minutes

Dev.to

Building a website with Replit and Vercel

Dev.to

Supercharging Your CI/CD: Integrating TestSprite AI Testing with GitHub Actions

Dev.to

Claude and I aren't vibing at all

Dev.to

I tested Qwen3.6-27B, Qwen3.6-35B-A3B, Qwen3.5-27B and Gemma 4 on the same real architecture-writing task on an RTX 5090

Key Points

1) Best overall balance

2) Best editor / best strategist

3) Best exhaustive architecture quarry

4) Fourth place

One-shot ranking

Why

1) Qwen3.6-27B

2) Qwen3.6-35B

3) Qwen3.5-27B

4) Gemma4

Short version

Related Articles

Black Hat USA

Add cryptographic authorization to AI agents in 5 minutes

Building a website with Replit and Vercel

Supercharging Your CI/CD: Integrating TestSprite AI Testing with GitHub Actions

Claude and I aren't vibing at all

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer