Tested how OpenCode Works with SelfHosted LLMS: Qwen 3.5 & 3.6, Gemma 4, Nemotron 3, GLM-4.7 Flash...

Reddit r/LocalLLaMA / 4/6/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • The author tested OpenCode’s usability and readiness with multiple self-hosted LLMs using two tasks: a simple IndexNow CLI in Go and a more complex website migration map generation.
  • Models evaluated included Qwen 3.5/3.6, Gemma 4, Nemotron 3, and GLM-4.7 Flash, with context windows ranging roughly from 25k to 50k depending on the task and model.
  • On an RTX 4080 (16GB VRAM) using llama-server with default settings, the reported inference speeds varied across models and may improve with tuning (e.g., memory/layer parameters or other configuration).
  • The takeaway highlights Qwen 3.5 27B as well-suited to the author’s hardware and notes that Gemma 4 26B produced very promising results that warrant further testing.
  • For the two tested tasks, Qwen 3.5 and Gemma 4 were described as comparable to certain cloud-hosted “free” LLM offerings accessible via OpenCode Zen.
Tested how OpenCode Works with SelfHosted LLMS: Qwen 3.5 & 3.6, Gemma 4, Nemotron 3, GLM-4.7 Flash...

I have run two tests on each LLM with OpenCode to check their basic readiness and convenience:

- Create IndexNow CLI in Golang (Easy Task) and

- Create Migration Map for a website following SiteStructure Strategy. (Complex Task)

Tested Qwen 3.5, & 3.6, Gemma 4, Nemotron 3, GLM-4.7 Flash and several other LLMs.

Context size used: 25k-50k - varies between tasks and models.

The result is in the table below, hope you find it useful.

https://preview.redd.it/gdrou1bmdjtg1.png?width=686&format=png&auto=webp&s=026c50e383957c2c526676c10a3c5f12ad705e8e

The speed of most of these selfhosted LLMs - on RTX 4080 (16GB VRAM) is below (to give you idea how fast/slow each model is).

Used llama-server with default memory and layers params. Finetuning these might help you to improve speed a bit. Or maybe a bit more than a bit :)

https://preview.redd.it/fa3zqfb1ejtg1.png?width=820&format=png&auto=webp&s=deed71b62c203a605dbbcdcee560966ab5030935

---

My Takeaway:

Qwen 3.5 27b is a very decent LLM that suit my hardware well.

New Gemma 4 26b showed very good results, worth testing more.

Both these are comparable to cloudhosted free LLMs from OpenCode Zen - for these two tasks.

---

The details of each LLM behaviour in each test are here: https://www.glukhov.org/ai-devtools/opencode/llms-comparison/

submitted by /u/rosaccord
[link] [comments]