Comparing Qwen3.5 27B vs Gemma 4 31B for agentic stuff

Reddit r/LocalLLaMA / 4/14/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

The post compares two local LLM variants for “agentic” tasks: Qwen3.5-27B-UD-Q5_K_XL and gemma-4-31B-it-UD-Q5_K_XL, using similar runtime flags and settings.
Both models are tested with reasoning enabled, long context configuration, flash attention, GPU-layer offloading, and image token limits, plus a multimodal projector for image handling.
Qwen3.5 is reported to take more steps and perform checks (including environment-variable checks) and sometimes switch scripting styles (creating Python vs Bash), which can improve final task completion quality.
Gemma 4 is described as more direct—often finding relevant URLs more effectively—but it may fail to finish the final goal, with an example where the Telegram message was truncated.
The author emphasizes these are preliminary, fun experiments and asks for additional tests to validate which model is better for agentic workflows.

Models compared:

Main flags for boths

--flash-attn on \

--n-gpu-layers 99 \

--no-mmap \

-c 150000 \

--temp 1 --top-p 0.9 --min-p 0.1 --top-k 20 \

--ctx-checkpoints 1 \

--jinja \

-np 1 \

--reasoning on \

--mmproj 'mmproj-BF16.gguf' \

--image-min-tokens 300 --image-max-tokens 512

I know they may not be the best and I still need more experiments (thank you u/Sadman782) I find these tests fun and interesting.

Model	Observations
Qwen3.5-27B-UD-Q5_K_XL	More steps, checks env var, corrects its fails to fully address the requests so final results is good (in the example, the telegram message is perfect), sometimes create a python script instead of bash only
gemma-4-31B-it-UD-Q5_K_XL	More direct (smarter to finds urls) but may miss the final goal (in this example the telegram message was truncated

Please let me know if you need more tests.