ついに来たようですね!さっそくテストしてみる!!!
https://github.com/ggml-org/llama.cpp/releases/tag/b8967
プラットフォーム: RTX 5090+(RTX5060TI - ただしテスト中は未使用) - Ryzen 9 9950X3D+128 GB DDR5 5600 CL36):
テスト:
CUDA_VISIBLE_DEVICES=0 /home/marcin/llama.cpp/llama-bench \\
-m /home/marcin/llama.cpp_models/Qwen3.6-27B-NVFP4/Qwen3.6-27B-NVFP4.gguf \\
-ngl 999 \\
-fa 1 \\
-p 512,2048 \\
-n 128,512 \\
-d 0,4096,8192,16384,32768 \\
-r 5 \\
-o md | tee /home/marcin/qwen3.6-27b-nvfp4-gpu0-bench-depth.md
| model | size | params | backend | ngl | fa | test | t/s |
| qwen35 27B NVFP4 | 17.50 GiB | 26.90 B | CUDA | 999 | 1 | pp512 | 5546.93 ± 220.29 |
| qwen35 27B NVFP4 | 17.50 GiB | 26.90 B | CUDA | 999 | 1 | pp2048 | 5594.58 ± 7.70 |
| qwen35 27B NVFP4 | 17.50 GiB | 26.90 B | CUDA | 999 | 1 | tg128 | 73.62 ± 0.16 |
| qwen35 27B NVFP4 | 17.50 GiB | 26.90 B | CUDA | 999 | 1 | tg512 | 73.68 ± 0.05 |
| qwen35 27B NVFP4 | 17.50 GiB | 26.90 B | CUDA | 999 | 1 | pp512 @ d4096 | 5232.92 ± 144.37 |
| qwen35 27B NVFP4 | 17.50 GiB | 26.90 B | CUDA | 999 | 1 | pp2048 @ d4096 | 5272.82 ± 7.11 |
| qwen35 27B NVFP4 | 17.50 GiB | 26.90 B | CUDA | 999 | 1 | tg128 @ d4096 | 72.47 ± 0.16 |
| qwen35 27B NVFP4 | 17.50 GiB | 26.90 B | CUDA | 999 | 1 | tg512 @ d4096 | 72.50 ± 0.06 |
| qwen35 27B NVFP4 | 17.50 GiB | 26.90 B | CUDA | 999 | 1 | pp512 @ d8192 | 4995.34 ± 135.04 |
| qwen35 27B NVFP4 | 17.50 GiB | 26.90 B | CUDA | 999 | 1 | pp2048 @ d8192 | 5005.44 ± 4.18 |
| qwen35 27B NVFP4 | 17.50 GiB | 26.90 B | CUDA | 999 | 1 | tg128 @ d8192 | 71.57 ± 0.18 |
| qwen35 27B NVFP4 | 17.50 GiB | 26.90 B | CUDA | 999 | 1 | tg512 @ d8192 | 71.61 ± 0.06 |
| qwen35 27B NVFP4 | 17.50 GiB | 26.90 B | CUDA | 999 | 1 | pp512 @ d16384 | 4537.54 ± 129.55 |
| qwen35 27B NVFP4 | 17.50 GiB | 26.90 B | CUDA | 999 | 1 | pp2048 @ d16384 | 4547.25 ± 3.11 |
| qwen35 27B NVFP4 | 17.50 GiB | 26.90 B | CUDA | 999 | 1 | tg128 @ d16384 | 70.04 ± 0.16 |
| qwen35 27B NVFP4 | 17.50 GiB | 26.90 B | CUDA | 999 | 1 | tg512 @ d16384 | 69.90 ± 0.06 |
| qwen35 27B NVFP4 | 17.50 GiB | 26.90 B | CUDA | 999 | 1 | pp512 @ d32768 | 3586.58 ± 71.03 |
| qwen35 27B NVFP4 | 17.50 GiB | 26.90 B | CUDA | 999 | 1 | pp2048 @ d32768 | 3560.58 ± 2.65 |
| qwen35 27B NVFP4 | 17.50 GiB | 26.90 B | CUDA | 999 | 1 | tg128 @ d32768 | 66.88 ± 0.11 |
| qwen35 27B NVFP4 | 17.50 GiB | 26.90 B | CUDA | 999 | 1 | tg512 @ d32768 | 66.98 ± 0.02 |
同じモデルでの完全比較 - llama.cpp における native NVFP4 サポート(ビルドネイティブ)とそうでないものの比較は以下で利用可能です:
https://www.reddit.com/r/LocalLLaMA/comments/1syxckc/llamacpp_benchmark_native_vs_non_native_nvfp4_on/
投稿者