Another appreciation post for qwen3.5 27b model

Reddit r/LocalLLaMA / 3/24/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • Reddit投稿者が、ローカル開発用途でQwen3.5 27B(各量子化版)やQwen3.5 122B、Nemotron-3 Super 120B、gpt-oss 120b、さらにgpt-5.4 highとの比較テストを実施し、総合的な出来を報告した。
  • 結果としてNemotron-3 Super 120Bはgpt-5.4 highと同等レベルで非常に良く、Qwen3.5 25Bも健闘した一方、gpt-oss 120bとQwen3.5 122Bは相対的にパフォーマンスが低かったという。
  • 投稿者の環境(RTX 3090を複数)では、Qwen3.5 27BのQ6(Q6_K_XL)を実開発タスクで現実的に運用でき、追加ハード投資なしで済む点を大きな利点としている。
  • 併せて、vast.ai上での稼働条件(context長やトークン生成速度など)と、llama.cpp/llama-serverでの実行コマンド例を共有し、ローカル運用の再現性を高めている。
  • APIサブスクリプションの置き換え(少なくとも日常的タスク)に繋がる可能性があるとして、複雑タスクでは引き続きCODEXを使う方針も述べている。

I tested qwen3.5 122b when it went out, I really liked it and for my development tests it was on pair to gemini 3 flash (my current AI tool for coding) so I was looking for hardware investing, the problem is I need a new mobo and 1 (or 2 more 3090) and the price is just too high right now.

I saw a lot of posts saying that qwen3.5 27b was better than 122b it actually didn't made sense to me, then I saw nemotron 3 super 120b but people said it was not better than qwen3.5 122b, I trusted them.

Yesterday and today I tested all these models:

"unsloth/Qwen3.5-27B-GGUF:UD-Q4_K_XL"
"unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q4_K_XL"
"unsloth/Qwen3.5-122B-A10B-GGUF"
"unsloth/Qwen3.5-27B-GGUF:UD-Q6_K_XL"
"unsloth/Qwen3.5-27B-GGUF:UD-Q8_K_XL"
"unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF:UD-IQ4_XS"
"unsloth/gpt-oss-120b-GGUF:F16"

I also tested against gpt-5.4 high so I can compare them better.

To my sorprise nemotron was very, very good model, on par with gpt-5.4 and also qwen3.5-25b did great as well.

Sadly (but also good) gpt-oss 120b and qwen3.5 122b performed worse than the other 2 models (good because they need more hardware).

So I can finally use "Qwen3.5-27B-GGUF:UD-Q6_K_XL" for real developing tasks locally, the best is I don't need to get more hardware (I already own 2x 3090).

I am sorry for not providing too much info but I didn't save the tg/pp for all of them, nemotron ran at 80 tg and about 2000 pp, 100k context on vast.ai with 4 rtx 3090 and Qwen3.5-27B Q6 at 803pp, 25 tg, 256k context on vast.ai as well.

I'll setup it locally probably next week for production use.

These are the commands I used (pretty much copied from unsloth page):

./llama.cpp/llama-server -hf unsloth/Qwen3.5-27B-GGUF:UD-Q6_K_XL --ctx-size 262144 --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00 -ngl 999 

P.D.

I am so glad I can actually replace API subscriptions (at least for the daily tasks), I'll continue using CODEX for complex tasks.

If I had the hardware that nemotron-3-super 120b requires, I would use it instead, it also responded always on my own language (Spanish) while others responded on English.

submitted by /u/robertpro01
[link] [comments]

Another appreciation post for qwen3.5 27b model | AI Navigate