Im using these settings in llama.cpp: --spec-type ngram-map-k --spec-ngram-size-n 24 --draft-min 12 --draft-max 48
Whats the real reason for lets say the prompt is for "minor changes in code", whats differing between models:
Gemma 4 31b: Doubles in tks gen so 100%
Qwen 3.6: Only 40% more speed
Devstrall small: 665% increase in speed (what?)
EDIT:
added --repeat-penalty 1.0 and --spec-type ngram-mod instead for Qwen 3.6, now speed is increased by 140tks over 100tks base in minor edits.
[link] [comments]


![Runtime security for AI agents: risk scoring, policy enforcement, and rollback for production agent pipeline [P]](/_next/image?url=https%3A%2F%2Fpreview.redd.it%2Fjaatbenjg9wg1.jpg%3Fwidth%3D140%26height%3D80%26auto%3Dwebp%26s%3D43ed5a4d6806da42e7feccd461f2fe78add2eae0&w=3840&q=75)

