https://github.com/ggml-org/llama.cpp/pull/19493
Some prompts get a speedup, others don't (cases of low draft acceptance streak).
Good working params depend on the task type and repetition patterns.
For coding, I got some 0%~50% speedup with these params:
--spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 48 --draft-max 64 [link] [comments]




