https://github.com/ggml-org/llama.cpp/pull/21038
Since now cache quantization has better quality, does that mean Q8 cache is a good choice now? For example for 26B Gemma4?
[link] [comments]
Reddit r/LocalLLaMA / 4/14/2026
https://github.com/ggml-org/llama.cpp/pull/21038
Since now cache quantization has better quality, does that mean Q8 cache is a good choice now? For example for 26B Gemma4?