Decided to try out the new --spec-type ngram-mod feature in llama.cpp using Qwen3.6 27B during an OpenCode bug chasing session. TLDR: Performance is variable, but so far it seems to provide a nice speed increase for working on the same code base.
Here's a baseline llama-bench test:
$: llama-bench-vulkan -m 'Qwen3.6-27B-UD-Q4_K_XL.gguf' WARNING: radv is not a conformant Vulkan implementation, testing use only. ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = AMD Radeon AI PRO R9700 (RADV GFX1201) (radv) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat | model | size | params | backend | ngl | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | qwen35 27B Q4_K - Medium | 16.39 GiB | 26.90 B | Vulkan | 99 | pp512 | 1050.13 ± 0.54 | | qwen35 27B Q4_K - Medium | 16.39 GiB | 26.90 B | Vulkan | 99 | tg128 | 31.26 ± 0.01 | build: 97895129e (8863) My llama-server run flags:
llama-server-vulkan -m '/Qwen3.6-27B-UD-Q4_K_XL.gguf' --mmproj '/mmproj-BF16(3).gguf' -np 1 -ngl 99 --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00 --presence_penalty 0.00 --jinja --chat-template-kwargs '{"preserve_thinking": true}' -ub 2048 -fa 1 --spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 12 --draft-max 48 --host 0.0.0.0 --port 8180 Stats Summary:
--- Prompt Processing (PPS) Statistics --- Mean: 549.60 t/s Median: 519.19 t/s P95: 936.60 t/s StdDev: 240.80 (Stability) Range: 64.18 - 1015.91 t/s --- Token Generation (Tok/s) Statistics --- Mean: 28.80 t/s Median: 28.20 t/s P95: 45.34 t/s StdDev: 6.78 (Stability) Range: 16.49 - 53.63 t/s Total Tokens Generated: 87840 $:~/Documents/llama_perf$ python3 parse_performance_stats_full.py == Prompt Processing (PPS) Analysis == Effective Avg: 549.60 t/s (Token-Weighted) Median (P50): 519.19 t/s Tail (P99): 958.31 t/s Stability(CV): 43.8% (JITTERY) Skewness: 0.04 (Symmetric) == Token Generation (Tok/s) Analysis == Effective Avg: 1697.20 t/s (Token-Weighted) Median (P50): 28.20 t/s Tail (P99): 51.39 t/s Stability(CV): 23.5% (JITTERY) Skewness: 1.40 (Burst Heavy) $:~/Documents/llama_perf$ Raw data:
$:~/Documents/llama_perf$ python3 parse_performance_stats.py Task ID | PPS (Prompt) | Tok/s (Gen) | Gen Tokens ------------------------------------------------------------ 7824 | 72.51 | 25.76 | 340 8053 | 330.16 | 22.49 | 709 8629 | 345.13 | 20.84 | 1820 10286 | 64.18 | 28.11 | 181 10372 | 309.37 | 19.31 | 123 10496 | 360.21 | 27.07 | 891 11071 | 345.78 | 34.59 | 1595 11810 | 349.13 | 21.83 | 389 12124 | 304.43 | 27.89 | 438 12364 | 320.76 | 24.20 | 408 12673 | 304.25 | 22.16 | 281 12899 | 281.09 | 19.12 | 286 13188 | 777.57 | 25.27 | 1428 14644 | 970.67 | 30.00 | 231 14863 | 834.32 | 32.17 | 98 14944 | 651.29 | 35.26 | 90 15012 | 690.06 | 28.15 | 98 15101 | 706.03 | 30.84 | 97 15177 | 678.13 | 39.51 | 100 15243 | 695.42 | 28.46 | 85 15330 | 347.35 | 27.75 | 83 15404 | 527.11 | 28.71 | 79 15485 | 495.88 | 28.83 | 73 15552 | 757.88 | 28.85 | 70 15610 | 754.61 | 27.08 | 106 15716 | 343.11 | 30.13 | 82 15784 | 597.03 | 28.51 | 77 15848 | 724.77 | 25.24 | 91 15932 | 612.62 | 40.13 | 87 15986 | 603.72 | 28.13 | 125 16105 | 545.72 | 27.96 | 105 16212 | 140.18 | 30.04 | 53 16256 | 518.56 | 27.60 | 1330 17587 | 705.96 | 27.46 | 336 1 | 891.36 | 27.73 | 1644 1621 | 689.95 | 30.96 | 750 2238 | 87.37 | 27.05 | 348 2593 | 86.72 | 27.15 | 2003 4593 | 86.10 | 27.07 | 161 4728 | 431.04 | 26.33 | 178 4900 | 86.53 | 28.26 | 112 4987 | 87.27 | 27.09 | 161 5129 | 346.48 | 28.73 | 104 5214 | 426.83 | 37.51 | 147 5295 | 369.10 | 27.33 | 74 5371 | 258.20 | 27.12 | 172 5545 | 82.23 | 28.34 | 83 5619 | 78.99 | 39.80 | 163 5711 | 342.33 | 25.94 | 103 5814 | 557.16 | 27.15 | 92 5908 | 82.57 | 24.07 | 112 6011 | 655.56 | 16.87 | 255 6250 | 538.12 | 16.73 | 259 6509 | 226.40 | 19.07 | 78 6572 | 380.42 | 17.08 | 84 6650 | 369.20 | 17.92 | 176 6805 | 542.54 | 19.01 | 133 6917 | 508.31 | 17.65 | 711 7567 | 592.44 | 21.26 | 113 0 | 825.63 | 26.19 | 258 265 | 570.25 | 26.75 | 170 410 | 400.81 | 24.33 | 97 501 | 495.63 | 25.28 | 153 649 | 602.06 | 22.47 | 315 871 | 317.47 | 16.50 | 746 1616 | 75.78 | 16.49 | 105 1717 | 458.49 | 16.79 | 111 1830 | 135.83 | 16.80 | 347 0 | 837.89 | 26.31 | 764 794 | 651.57 | 24.01 | 116 905 | 224.91 | 25.38 | 80 969 | 551.64 | 29.70 | 81 1029 | 547.99 | 24.96 | 89 1118 | 545.28 | 25.38 | 86 1187 | 596.21 | 25.20 | 81 1267 | 387.68 | 25.03 | 83 1342 | 526.17 | 25.98 | 616 1960 | 795.61 | 23.57 | 177 2169 | 518.94 | 24.00 | 75 2245 | 487.28 | 28.62 | 84 2307 | 519.44 | 26.36 | 218 2506 | 83.51 | 25.92 | 184 2674 | 317.34 | 25.31 | 101 2756 | 491.71 | 25.41 | 690 3424 | 540.33 | 33.60 | 184 3529 | 511.05 | 28.57 | 106 3601 | 523.09 | 27.26 | 471 4014 | 518.84 | 25.74 | 251 4238 | 82.16 | 23.83 | 163 4401 | 338.39 | 46.13 | 83 4437 | 324.35 | 23.52 | 126 4560 | 248.12 | 25.89 | 81 4634 | 443.34 | 24.78 | 182 4804 | 463.62 | 28.23 | 83 4872 | 438.71 | 31.26 | 635 5352 | 504.33 | 22.47 | 96 5439 | 277.02 | 25.48 | 179 5596 | 506.73 | 39.77 | 179 5687 | 493.95 | 23.50 | 69 5757 | 523.45 | 25.08 | 110 5869 | 105.32 | 23.02 | 67 5938 | 200.24 | 24.93 | 316 6256 | 555.49 | 45.34 | 175 6327 | 466.26 | 24.61 | 262 0 | 761.08 | 24.29 | 139 160 | 505.55 | 22.34 | 117 271 | 256.61 | 28.42 | 83 322 | 426.93 | 30.01 | 97 388 | 482.84 | 27.16 | 96 463 | 494.38 | 24.48 | 1150 1613 | 259.32 | 23.89 | 73 1683 | 167.49 | 23.52 | 80 1755 | 318.21 | 24.25 | 3084 4834 | 318.37 | 22.71 | 88 4909 | 451.91 | 24.01 | 160 5051 | 429.60 | 24.10 | 112 5144 | 426.04 | 24.11 | 1209 6326 | 563.82 | 23.99 | 207 6529 | 512.83 | 34.04 | 90 6585 | 498.78 | 28.49 | 92 6656 | 492.01 | 24.35 | 104 6738 | 484.51 | 29.75 | 92 6797 | 450.49 | 29.46 | 95 6859 | 437.55 | 23.36 | 650 7504 | 235.33 | 23.13 | 81 7568 | 405.40 | 27.63 | 126 7661 | 426.11 | 22.62 | 137 7798 | 351.68 | 28.88 | 100 7865 | 445.78 | 23.28 | 122 7981 | 398.07 | 22.79 | 155 8136 | 265.58 | 22.67 | 83 8201 | 375.09 | 23.50 | 446 8623 | 419.87 | 23.31 | 921 9516 | 424.62 | 23.22 | 98 9594 | 399.86 | 23.04 | 557 10133 | 410.36 | 30.93 | 85 10180 | 445.30 | 26.01 | 82 10240 | 384.94 | 25.42 | 147 10356 | 369.66 | 22.97 | 312 10670 | 1011.00 | 29.40 | 153 10819 | 735.71 | 30.75 | 65 10877 | 912.32 | 28.97 | 92 10969 | 829.14 | 28.24 | 132 11108 | 710.79 | 28.56 | 94 11195 | 694.49 | 29.13 | 129 11313 | 440.72 | 28.87 | 67 11373 | 736.58 | 43.25 | 100 11431 | 278.92 | 28.97 | 89 11513 | 564.79 | 30.91 | 97 11585 | 464.87 | 32.45 | 93 11659 | 605.83 | 28.62 | 63 11715 | 727.11 | 28.05 | 180 11879 | 643.30 | 30.79 | 126 11985 | 665.26 | 29.20 | 149 12111 | 492.23 | 27.98 | 72 12176 | 695.06 | 26.40 | 164 12340 | 558.65 | 26.57 | 2933 15263 | 447.12 | 21.40 | 271 15534 | 1015.91 | 30.65 | 87 15619 | 923.95 | 30.58 | 1613 17127 | 455.62 | 21.57 | 186 17307 | 939.74 | 31.02 | 70 17371 | 897.35 | 33.11 | 1213 18401 | 450.77 | 23.31 | 694 19047 | 939.26 | 30.94 | 71 19112 | 921.63 | 29.57 | 1399 20514 | 440.08 | 21.55 | 179 20680 | 941.92 | 30.28 | 86 20769 | 916.08 | 29.72 | 213 20985 | 630.99 | 28.39 | 90 21076 | 783.87 | 29.83 | 90 21153 | 869.66 | 31.89 | 141 21270 | 559.49 | 28.48 | 163 21434 | 781.38 | 29.42 | 115 21543 | 783.60 | 33.50 | 129 21647 | 542.43 | 29.70 | 88 21728 | 681.01 | 30.92 | 282 21984 | 583.15 | 27.92 | 108 22092 | 87.14 | 26.63 | 117 22207 | 552.15 | 28.99 | 90 22284 | 648.15 | 27.79 | 110 22394 | 758.16 | 29.34 | 103 22482 | 570.20 | 28.52 | 1171 23655 | 449.73 | 22.45 | 191 23840 | 913.13 | 30.05 | 102 23944 | 924.18 | 29.36 | 249 24198 | 797.90 | 30.26 | 76 24266 | 859.60 | 28.60 | 155 24419 | 613.57 | 29.71 | 87 24498 | 696.11 | 34.20 | 105 24578 | 654.08 | 29.09 | 107 24678 | 601.79 | 29.27 | 96 24759 | 667.10 | 28.99 | 116 24868 | 700.61 | 34.60 | 110 24952 | 722.68 | 27.95 | 2270 27224 | 434.52 | 22.17 | 373 27586 | 920.69 | 30.19 | 82 27670 | 923.33 | 29.41 | 135 27802 | 878.87 | 28.93 | 159 27967 | 697.86 | 29.29 | 101 28061 | 694.84 | 35.07 | 114 28150 | 724.74 | 36.25 | 84 28209 | 362.26 | 34.01 | 87 28277 | 726.33 | 33.11 | 119 28375 | 738.59 | 27.36 | 95 28470 | 571.26 | 25.75 | 94 28562 | 372.33 | 28.18 | 80 28631 | 598.19 | 29.04 | 97 28721 | 669.38 | 25.55 | 108 28821 | 396.21 | 31.45 | 86 28887 | 618.82 | 27.92 | 2077 30958 | 429.42 | 22.30 | 405 31356 | 916.46 | 30.26 | 75 31433 | 897.39 | 36.61 | 949 32154 | 417.12 | 34.14 | 398 32348 | 940.13 | 30.26 | 71 32421 | 921.72 | 46.64 | 1434 33187 | 422.44 | 49.40 | 397 33303 | 937.79 | 32.47 | 105 33395 | 924.34 | 29.25 | 1684 35077 | 418.33 | 48.17 | 421 35215 | 928.92 | 30.81 | 78 35287 | 906.27 | 29.21 | 2857 38060 | 422.58 | 48.37 | 402 38182 | 936.60 | 34.20 | 72 38240 | 916.12 | 44.28 | 3143 39949 | 421.28 | 44.29 | 415 40073 | 939.96 | 30.25 | 75 40150 | 905.92 | 40.91 | 1662 41202 | 412.22 | 47.27 | 403 41325 | 938.87 | 30.36 | 76 41403 | 916.59 | 38.85 | 1532 42476 | 399.14 | 48.52 | 402 42586 | 938.19 | 34.64 | 74 42645 | 915.96 | 32.35 | 1551 43997 | 407.69 | 53.03 | 383 44096 | 930.86 | 31.11 | 68 44157 | 919.13 | 29.52 | 853 45012 | 398.91 | 49.45 | 387 45118 | 935.23 | 30.34 | 83 45203 | 925.79 | 52.86 | 1615 45981 | 396.90 | 48.34 | 390 46092 | 936.96 | 30.29 | 88 46182 | 915.64 | 53.63 | 2544 [link] [comments]




