Tested how OpenCode Works with SelfHosted LLMS: Qwen 3.5, 3.6, Gemma 4, Nemotron 3, GLM-4.7 Flash - v2

Reddit r/LocalLLaMA / 4/22/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

Read original →

共有:

Key Points

The article reports two practical benchmark tests of the OpenCode workflow with multiple self-hosted LLMs, assessing both an easy coding task and a complex website migration mapping task.
Models tested include Qwen 3.5/3.6, Gemma 4, Nemotron 3, and GLM-4.7 Flash, with context sizes generally in the 25k–50k range depending on task and model.
On an RTX 4080 (16GB VRAM) using llama.cpp with mostly default memory/layer settings, the author finds that many self-hosted models run slower than expected, though tuning parameters could improve speed.
In the updated v2 results, additional checks show mixed performance: Qwen 3.6 35B quant variants perform worse than expected, while Qwen 3 Coder Next performs very well and Qwen 3.5 27B shows disappointment.
The author’s takeaway is that Qwen 3.5 27B is well-suited to the author’s hardware, Gemma 4 also performs strongly, and some larger variants (e.g., Gemma 4 31B) may be impractical for 16GB VRAM self-hosting.

Tested how OpenCode Works with SelfHosted LLMS: Qwen 3.5, 3.6, Gemma 4, Nemotron 3, GLM-4.7 Flash - v2

I have run two tests on each LLM with OpenCode to check their basic readiness and convenience:

- Create IndexNow CLI in Golang (Easy Task) and

- Create Migration Map for a website following SiteStructure Strategy. (Complex Task)

Tested Qwen 3.5, & 3.6, Gemma 4, Nemotron 3, GLM-4.7 Flash and several other LLMs.

Context size used: 25k-50k - varies between tasks and models.

The result is in the table below, the most of exact quant names are in the speed test table.

Hope you find it useful.

---

Here in v2 I added tests of

- Qwen 3.6 35b q3 and q4 => the result is worse then expected

- Qwen 3 Coder Next => very good result

- and Qwen 3.5 27b q3 Bartowsky => disappointed

https://preview.redd.it/akly3cx1sowg1.png?width=687&format=png&auto=webp&s=5eb5f4868d87b5c78924916e9078b6f63e1d6d82

The speed of most of these selfhosted LLMs - on RTX 4080 (16GB VRAM) is below (to give you an idea how fast/slow each model is).

Used llama.cpp with recommended temp, top-p and other params, and default memory and layers params. Finetuning these might help you to improve speed a bit. Or maybe a bit more than a bit :)

https://preview.redd.it/uf1gszu8qowg1.png?width=661&format=png&auto=webp&s=7a0c9b6167ba582ad885640819754e46da28f735

My Takeaway from this test iteration:

- Qwen 3.5 27b is a very decent LLM (Unthloth's quants) that suit my hardware well.

- Qwen3 Coder Next is better then Qwen 3.5 and 3.6 35b.

- Qwen 3.5 and 3.6 35b are good, but not good enough for my tasks.

- Both Gemma 4 26b and 31b showed very good results too, though for self-hosing on 16GB VRAM the 31b variant is too big.

---

The details of each LLM behaviour in each test are here:

https://www.glukhov.org/ai-devtools/opencode/llms-comparison/

submitted by /u/rosaccord
[link] [comments]

Black Hat USA

AI Business

Autoencoders and Representation Learning in Vision

Dev.to

Every AI finance app wants your data. I didn’t trust that — so I built my own. Offline.

Dev.to

Control Claude with Just a URL. The Chrome Extension "Send to Claude" Is Incredibly Useful

Dev.to

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

Dev.to

Tested how OpenCode Works with SelfHosted LLMS: Qwen 3.5, 3.6, Gemma 4, Nemotron 3, GLM-4.7 Flash - v2

Key Points

Related Articles

Black Hat USA

Autoencoders and Representation Learning in Vision

Every AI finance app wants your data. I didn’t trust that — so I built my own. Offline.

Control Claude with Just a URL. The Chrome Extension "Send to Claude" Is Incredibly Useful

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer