Another appreciation post for qwen3.5 27b model

Reddit r/LocalLLaMA / 3/24/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

Reddit投稿者が、ローカル開発用途でQwen3.5 27B（各量子化版）やQwen3.5 122B、Nemotron-3 Super 120B、gpt-oss 120b、さらにgpt-5.4 highとの比較テストを実施し、総合的な出来を報告した。
結果としてNemotron-3 Super 120Bはgpt-5.4 highと同等レベルで非常に良く、Qwen3.5 25Bも健闘した一方、gpt-oss 120bとQwen3.5 122Bは相対的にパフォーマンスが低かったという。
投稿者の環境（RTX 3090を複数）では、Qwen3.5 27BのQ6（Q6_K_XL）を実開発タスクで現実的に運用でき、追加ハード投資なしで済む点を大きな利点としている。
併せて、vast.ai上での稼働条件（context長やトークン生成速度など）と、llama.cpp/llama-serverでの実行コマンド例を共有し、ローカル運用の再現性を高めている。
APIサブスクリプションの置き換え（少なくとも日常的タスク）に繋がる可能性があるとして、複雑タスクでは引き続きCODEXを使う方針も述べている。

I tested qwen3.5 122b when it went out, I really liked it and for my development tests it was on pair to gemini 3 flash (my current AI tool for coding) so I was looking for hardware investing, the problem is I need a new mobo and 1 (or 2 more 3090) and the price is just too high right now.

I saw a lot of posts saying that qwen3.5 27b was better than 122b it actually didn't made sense to me, then I saw nemotron 3 super 120b but people said it was not better than qwen3.5 122b, I trusted them.

Yesterday and today I tested all these models:

"unsloth/Qwen3.5-27B-GGUF:UD-Q4_K_XL"
"unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q4_K_XL"
"unsloth/Qwen3.5-122B-A10B-GGUF"
"unsloth/Qwen3.5-27B-GGUF:UD-Q6_K_XL"
"unsloth/Qwen3.5-27B-GGUF:UD-Q8_K_XL"
"unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF:UD-IQ4_XS"
"unsloth/gpt-oss-120b-GGUF:F16"

I also tested against gpt-5.4 high so I can compare them better.

To my sorprise nemotron was very, very good model, on par with gpt-5.4 and also qwen3.5-25b did great as well.

Sadly (but also good) gpt-oss 120b and qwen3.5 122b performed worse than the other 2 models (good because they need more hardware).

So I can finally use "Qwen3.5-27B-GGUF:UD-Q6_K_XL" for real developing tasks locally, the best is I don't need to get more hardware (I already own 2x 3090).

I am sorry for not providing too much info but I didn't save the tg/pp for all of them, nemotron ran at 80 tg and about 2000 pp, 100k context on vast.ai with 4 rtx 3090 and Qwen3.5-27B Q6 at 803pp, 25 tg, 256k context on vast.ai as well.

I'll setup it locally probably next week for production use.

These are the commands I used (pretty much copied from unsloth page):

./llama.cpp/llama-server -hf unsloth/Qwen3.5-27B-GGUF:UD-Q6_K_XL --ctx-size 262144 --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00 -ngl 999

P.D.

I am so glad I can actually replace API subscriptions (at least for the daily tasks), I'll continue using CODEX for complex tasks.

If I had the hardware that nemotron-3-super 120b requires, I would use it instead, it also responded always on my own language (Spanish) while others responded on English.

submitted by /u/robertpro01
[link] [comments]

5 Signs Your Consulting Firm Needs AI Agents (Not More Staff)

Dev.to

AgentDesk vs Hiring Another Consultant: A Cost Comparison

Dev.to

v0.18.3

Ollama Releases

"Why Your AI Agent Needs a System 1"

Dev.to

When should we expect TurboQuant?

Reddit r/LocalLLaMA

Another appreciation post for qwen3.5 27b model

Key Points

Related Articles

5 Signs Your Consulting Firm Needs AI Agents (Not More Staff)

AgentDesk vs Hiring Another Consultant: A Cost Comparison

v0.18.3

"Why Your AI Agent Needs a System 1"

When should we expect TurboQuant?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer