it's the Gemma 4 26B A3B IQ4 NL.
My llama.cpp command is:
llama-server.exe -m "gemma-4-26B-A4B-it-UD-IQ4_NL.gguf" -ngl 999 -fa on -c 65536 -ctk q8_0 -ctv q8_0 --batch-size 1024 --ubatch-size 512 --temp 1.0 --top-k 64 --top-p 0.95 --min-p 0.0 --repeat-penalty 1.0 --no-warmup --port 8080 --host 0.0.0.0 --chat-template-kwargs "{\"enable_thinking\":true}" --perf
In essence, this is just the recommended setting's from Google, but this has served me damn well as a co-assistant to Claude Code in VS Code.
I gave it tests, and it's around 6.5/10. It reads my guide.md, it follows it, reads files, and many more. Its main issue is that it can't get past the intricacies of packages. What I mean by that is that it can't connect files to each other with full accuracy.
But that's it for its issues. Everything else has been great since it has a large context size and fast <100 tokens per second. This is one of the few models that have passed the carwash test from my testing.
[link] [comments]



