| This weekend I wanted to test how well a local LLM can work as the primary model for an agentic coding assistant like OpenCode or OpenAI Codex. I picked Qwen3.5-27B, a hybrid architecture model that has been getting a lot of attention lately for its performance relative to its size, set it up locally and ran it with OpenCode to see how far it could go. I set it up on my NVIDIA RTX4090 (24GB) workstation running the model via llama.cpp and using it with OpenCode running on my macbook (connection via Tailscale). Setup:
Based on my testing:
I would say setting up the whole workflow was a great learning experience in itself. It is one thing to use a local model as a chat assistant and another to use it with an agentic coding assistant, especially getting tool calling with correct agentic behavior working. You have to make a lot of decisions: the right quantization that fits well on your machine, best model in the size category, correct chat template for tool calling, best context size and KV cache settings. I also wrote a detailed blog covering the full setup, step by step, along with all the gotchas and practical tips I learned. Happy to answer any questions about the setup. Blogpost: https://aayushgarg.dev/posts/2026-03-29-local-llm-opencode/ [link] [comments] |
Running Qwen3.5-27B locally as the primary model in OpenCode
Reddit r/LocalLLaMA / 3/30/2026
💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage
Key Points
- A tester ran the Qwen3.5-27B LLM locally as the primary model for an agentic coding assistant (OpenCode/Codex-style workflow) to evaluate real coding and tool-calling performance.
- On an NVIDIA RTX 4090 (24GB) using llama.cpp with a 4-bit quantized, 64K context setup, they reported about ~2,400 tok/s prefill and ~40 tok/s generation while using OpenCode over Tailscale from a MacBook.
- The model performed surprisingly well for agentic tasks such as writing multiple Python scripts, making edits, debugging, testing, and executing code with correct tool calling.
- Performance improved further when adding agent skills and using Context7 as an MCP server to pull up-to-date documentation, but it was not ideal for “vibe coding” with loose prompts.
- The author emphasizes that achieving good agent behavior requires careful decisions around quantization, model/chat templates for tool calling, context size, and KV cache settings, and they published a step-by-step blog with practical gotchas.
Related Articles

Black Hat Asia
AI Business

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer
Simon Willison's Blog
Beyond the Chatbot: Engineering Multi-Agent Ecosystems in 2026
Dev.to

I missed the "fun" part in software development
Dev.to

The Billion Dollar Tax on AI Agents
Dev.to