For the past few weeks, I have been trying to get this model working on my hardware. It still feels incredible how much better open models have become. I couldn't have gotten this model to work on my 5yo laptop if not for this sub and its amazing people. The model is actually usable at ~23 t/s...even getting 10+ t/s when unplugged! It is very good to use with pi agent.
If you think this setup can be improved, I'd love to know more...
I've documented my full localmaxxing journey on my blog post here, someone might find it helpful.
TL;DR
Laptop: Asus ROG Zephyrus G14 2020
CPU: Ryzen 7 (8c 16t) @ 2900 Mhz (boost disabled)
Mem: 24GB DDR4-3200 RAM
GPU: RTX 2060 Max-Q 6GB VRAM
General:
#!/bin/bash llama-server \ -m ~/dev/models/Qwen3.6-35B-A3B-APEX-GGUF/Qwen3.6-35B-A3B-APEX-I-Compact.gguf \ -mm ~/dev/models/Qwen3.6-35B-A3B-GGUF/mmproj-F16.gguf \ --no-mmproj-offload \ -a Qwen3.6-35B-A3B-APEX-64k \ --host 0.0.0.0 --port 8000 \ --fit off -fa on \ --ctx-size 65536 \ --threads 8 --threads-batch 12 \ --cpu-range 0-7 --cpu-strict 1 \ --cpu-range-batch 0-11 --cpu-strict-batch 1 \ --numa isolate \ --prio 2 \ --no-mmap --parallel 1 --jinja \ --cache-type-k q8_0 --cache-type-v q8_0 \ --ubatch-size 1024 --batch-size 2048 \ --n-cpu-moe 36 \ --cache-reuse 256 \ --ctx-checkpoints 8 \ --metrics \ --cache-ram 4096 \ --spec-type ngram-mod \ --spec-ngram-mod-n-match 24 --spec-ngram-mod-n-min 12 --spec-ngram-mod-n-max 48 Long Context: (Tom's fork)
#!/bin/bash lm-server-tq \ -m ~/dev/models/Qwen3.6-35B-A3B-APEX-GGUF/Qwen3.6-35B-A3B-APEX-I-Compact.gguf \ -a Qwen3.6-35B-A3B-APEX-128k \ --host 0.0.0.0 --port 8000 \ --fit off -fa on \ --ctx-size 131072 \ --threads 8 --threads-batch 12 \ --cpu-range 0-7 --cpu-strict 1 \ --cpu-range-batch 0-11 --cpu-strict-batch 1 \ --numa isolate \ --prio 2 \ --no-mmap --parallel 1 --jinja \ --cache-type-k turbo3 --cache-type-v turbo4 \ --ubatch-size 1024 --batch-size 2048 \ --n-cpu-moe 36 \ --cache-reuse 256 \ --ctx-checkpoints 8 \ --metrics \ --cache-ram 4096 \ --spec-type ngram-mod \ --spec-ngram-mod-n-match 24 --spec-ngram-mod-n-min 12 --spec-ngram-mod-n-max 48 [link] [comments]




