https://huggingface.co/unsloth/gemma-4-E2B-it-GGUF
https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF
by u/danielhanchen:
We just updated them again in response to:
- kv-cache : support attention rotation for heterogeneous iSWA https://github.com/ggml-org/llama.cpp/pull/21513
- CUDA: check for buffer overlap before fusing - CRITICAL fixes
<unused24> tokenshttps://github.com/ggml-org/llama.cpp/pull/21566 - vocab : add byte token handling to BPE detokenizer for Gemma4 https://github.com/ggml-org/llama.cpp/pull/21488
- convert : set "add bos" == True for Gemma 4 https://github.com/ggml-org/llama.cpp/pull/21500
- common : add gemma 4 specialized parser https://github.com/ggml-org/llama.cpp/pull/21418
- llama-model: read final_logit_softcapping for Gemma 4 https://github.com/ggml-org/llama.cpp/pull/21390
- llama: add custom newline split for Gemma 4 https://github.com/ggml-org/llama.cpp/pull/21406
[link] [comments]



