I get a ~11% speed up with Gemma 3 270B as the draft model. Try it by adding:
--no-mmproj -hfd unsloth/gemma-3-270m-it-GGUF:Q8_0 Testing with (on a 3090):
./build/bin/llama-cli -hf unsloth/gemma-4-31B-it-GGUF:Q4_1 --jinja --temp 1.0 --top-p 0.95 --top-k 64 -ngl 1000 -st -f prompt.txt --no-mmproj -hfd unsloth/gemma-3-270m-it-GGUF:Q8_0 Gave me:
[ Prompt: 607.3 t/s | Generation: 36.6 t/s ]
draft acceptance rate = 0.44015 ( 820 accepted / 1863 generated)
vs.
[ Prompt: 613.8 t/s | Generation: 32.9 t/s ]
[link] [comments]

