[7900XT] Qwen3.6 27B for OpenCode

Reddit r/LocalLLaMA / 4/28/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

The post seeks guidance on the best way to set up Qwen3.6 27B for OpenCode while working with limited VRAM on an AMD Radeon RX 7900 XT.
The author shares a specific llama-server launch configuration (sampling params, cache settings, flash-attn, and a very large context window of 65,536) that currently uses about 18.6/20 GB of VRAM.
They estimate there may be room to increase VRAM usage by roughly 0.5 GB, potentially allowing slight tuning, such as context/cache-related adjustments.
The author compares the option of using Qwen3.6 35B, noting MoE and possible KV-cache quantization differences, but concludes it likely offers little benefit for their stated goal versus 27B.
Overall, the discussion is centered on practical performance/quality tuning for running Qwen-class models locally under VRAM constraints.

I'm just looking for some advice on optimally setting up Qwen3.6 27B for OpenCode. The VRAM is a little bit scarce, but I ended up with this so far:

llama-server --model models/Qwen3.6-27B-IQ4_XS.gguf \ --port 8080 \ --host 127.0.0.1 \ --top-p 0.95 \ --top-k 20 \ --min-p 0.0 \ --temperature 0.6 \ --flash-attn on \ --cache-type-k q8_0 \ --cache-type-v q8_0 \ --presence-penalty 0.0 \ --repeat-penalty 1.0 \ --ctx-size 65536 \ --chat-template-kwargs '{"preserve_thinking": true}' \

With this my VRAM usage is around 18.6/20 GB. So potentially I could stretch it by about 0.5GB.

Of course there is Qwen3.6 35B that thanks to MoE can fit without KV cache quantization and in Q4_K_M or even K_XL or maybe even Q5, but I don't think for this goal it would be of benefit over 27B.

submitted by /u/Mordimer86
[link] [comments]

Black Hat USA

AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

How I Automate My Dev Workflow with Claude Code Hooks

Dev.to

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System

Dev.to

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)

Dev.to

[7900XT] Qwen3.6 27B for OpenCode

Key Points

Related Articles

Black Hat USA

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

How I Automate My Dev Workflow with Claude Code Hooks

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer