Qwen 3.5 28B A3B REAP for coding initial impressions

Reddit r/LocalLLaMA / 4/13/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

A user shares initial, hands-on impressions of the Qwen 3.5 28B A3B REAP coding model, comparing quantized Hugging Face variants and reporting real CPU throughput on a Haswell i7 with 32GB RAM.
On their setup, the Qwen 3.5 28B A3B REAP GGUF runs at about ~7.5 tokens/sec (Q4_K_M), with noticeable slowdowns as context length grows during multi-turn prompt threads.
The model appears verbose in llama.cpp, producing detailed “thinking/planning” steps before outputting final answers, and it often references related concerns—making it useful for documentation tasks.
The author notes strong practical coding assistance (including generating well-formatted Markdown docs) and generally good code proposals, but also observes limitations in complex refactoring where multi-file/structural changes can require iterative follow-up.
In a challenging shell-script refactoring scenario (bug fixes plus adapting to changed JSON data structure), the model eventually produces working refactored code, but may initially miss required updates and needs a second run to fully align script logic with the new data format.

this is a follow up for
https://www.reddit.com/r/LocalLLaMA/comments/1sf8zp8/qwen_3_coder_30b_is_quite_impressive_for_coding/

I'd guess given the comments I've reviewed Qwen 3.5 (and Gemma 4) are deemed among the best models published for public consumption.

the original models in hf are here:
https://huggingface.co/collections/Qwen/qwen35
unsloth contributed various quants
https://huggingface.co/collections/unsloth/qwen35

among the models I tried are, on my plain old haswell i7 cpu 32 gb dram, all Q4_K_M quants
unsloth/Qwen3.5-27B-GGUF 0.95 tokens / s
unsloth/Qwen3.5-35B-A3B-GGUF 4 tokens / s
https://huggingface.co/barozp/Qwen-3.5-28B-A3B-REAP-GGUF

barozp/Qwen-3.5-28B-A3B-REAP-GGUF 7.5 tokens / s

tokens / s degrades as context becomes larger e.g. when following up with prompts in the same context / thread. it could from that 7.5 gradually down to 1 tok/s

What I used is the Qwen-3.5-28B-A3B-REAP-GGUF as that is 'small' enough to deliver a barely adequate throughput (7.5 t/s) on my hardware.

---
Initial impressions are that Qwen 3.5 tends to mention related concerns / references, and in llama.cpp, it does pretty verbose 'thinking' / planning steps before reverting with the actual response.

The mentions of related stuff, makes it a good documenter and I actually tasked it to analyse the codes of a shell script and prepare usage documentation for it. It does it pretty well in a nicely formatted .md.

Code proposals is good (and some ok), but the most interesting stuff as I always try to get llms to do, probably 'hard' stuff for these small LLMs is to *refactor* codes.

I asked it to refactor a shell script, fixing some bugs, and adapt it to some structural changes in data (e.g. the json format of data), quite complex a task I'd think for such 'small' llm, it burns through some > 10k tokens in the 'thinking' phase, but eventually did reverted with refactored codes. I'd guess that this llm is kind of 'careful' I've seen it iterating over (same) issues with 'wait ... ` , considering the dependencies / issues. The resulting codes are 'not a best refactoring' , i'd guess it tried to follow the requirements of my prompt closely.

among the things is a recursive proposal , i.e. refactor the data json structure, then to refactor the shell script to handle the refactored new data structure. it refactored the json data structure , but misses on updating the shell script to work with the new structure. it takes a second run with the new data structure and script for the new structure to be considered.
in addition, that if the prompt is 'too ambigious', it can go in loops in the 'thinking' phase trying to resolve those ambiguity, as seen in the 'thinking' phase, I tend to need to stop the inference, and restructure my prompt so that it is more specific, and that helps to get to the solution.

submitted by /u/ag789
[link] [comments]