Qwen3.6 35B-A3B is quite useful on 780m iGPU (llama.cpp,vulkan)

Reddit r/LocalLLaMA / 4/24/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

共有:

Key Points

A user reports that the Qwen3.6 MoE model (Qwen3.6-35B-A3B) runs effectively on an AMD Radeon 780M iGPU using llama.cpp with the Vulkan backend.
Benchmarks on the 780M show strong token generation and throughput (about 250+ pp and ~20 tg) when using a Q6_K quantized GGUF variant.
To get Q6 working reliably, the user had to adjust Vulkan/driver-related kernel parameters, including increasing GTT and hang timeout.
The user notes the model performs well even with the full context length, and they praise the Qwen team for the result.

I have ThinkPad T14 Gen 5 (8840U, Radeon 780M, 64GB DDR5 5600 MT/s ). Tried out the recent Qwen MoE release, and pp/tg speed is good (on vulkan) (250+pp, 20 tg):

~/dev/llama.cpp master* ❯ ./build-vulkan/bin/llama-bench \ -hf AesSedai/Qwen3.6-35B-A3B-GGUF:Q6_K \ -fa 1 \ -ub 1024 \ -b 1024 \ -p 1024 -n 128 -mmp 0 ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = AMD Radeon 780M Graphics (RADV PHOENIX) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | n_batch | n_ubatch | fa | mmap | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | ---: | --------------: | -------------------: | | qwen35moe 35B.A3B Q8_0 | 27.10 GiB | 34.66 B | Vulkan | 99 | 1024 | 1024 | 1 | 0 | pp1024 | 282.40 ± 6.55 | | qwen35moe 35B.A3B Q8_0 | 27.10 GiB | 34.66 B | Vulkan | 99 | 1024 | 1024 | 1 | 0 | tg128 | 20.74 ± 0.12 | build: ffdd983fb (8916) ~/dev/llama.cpp master* 1m 13s

In order to run Q6 I had to tweak kernel params (increased GTT and hang timeout), it works well even for the full context.

Pretty impressive I'd say. Kudos to Qwen team!

submitted by /u/itroot
[link] [comments]

Black Hat USA

AI Business

The 67th Attempt: When Your "Knowledge Management" System Becomes a Self-Fulfilling Prophecy of Excellence

Dev.to

Context Engineering for Developers: A Practical Guide (2026)

Dev.to

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

Dev.to

AI Visibility Tracking Exploded in 2026: 6 Tools Every Brand Needs Now

Dev.to

Qwen3.6 35B-A3B is quite useful on 780m iGPU (llama.cpp,vulkan)

Key Points

Related Articles

Black Hat USA

The 67th Attempt: When Your "Knowledge Management" System Becomes a Self-Fulfilling Prophecy of Excellence

Context Engineering for Developers: A Practical Guide (2026)

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

AI Visibility Tracking Exploded in 2026: 6 Tools Every Brand Needs Now

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer