Qwen3.6-27B Uncensored Aggressive is out with K_P quants!

Reddit r/LocalLLaMA / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • Qwen3.6の27B「Uncensored Aggressive」版が公開され、拒否(refusal)が0/465で、無改変のオリジナル挙動を保ったまま完全にアンセンサー化されているとされています。
  • 提供されるのはK_P量子化(Q8_K_P〜Q2_K_Pなど)を含むGGUF系量子モデルで、各モデルに最適化プロファイルを適用することで、品質を落としにくい設計(1〜2段階の品質向上)だと説明されています。
  • ユーザーのテストではループや劣化が起きず期待どおり動作した一方、35B-A3B版よりプロンプトの明確さへの感度が高く、形式・制約・範囲を具体化すると安定するとの指摘があります。
  • 「thinking」無効化は/Qwen3世代のソフトスイッチ(/think等)ではなく、jinjaテンプレート編集またはenable_thinking:falseというkwargで行う必要があるとされています。
  • モデルは27Bの高密度(64層)で、テキストに加え画像・動画のマルチモーダル対応、コンテキストは262K(YaRNで〜1M拡張だが注意)とされています。

The dense sibling of the 35B-A3B drop is here, Qwen3.6 27B Uncensored Aggressive is out!

Aggressive = no refusals; NO personality changes/alterations or any of that, it is the ORIGINAL release of Qwen just completely uncensored

https://huggingface.co/HauhauCS/Qwen3.6-27B-Uncensored-HauhauCS-Aggressive

0/465 refusals*. Fully unlocked with zero capability loss.

From my own testing: 0 issues. No looping, no degradation, everything works as expected.

One thing I noticed vs the 35B-A3B: this model is a bit more sensitive to prompt clarity. Vague/under-specified prompts can drift so do your best to spell out format, constraints, scope and it stays on rails. FYI so you get the most out of it. To me it seems like it's a 'coding/stem-first' model from the way it handles social interactions.

To disable "thinking" you need to edit the jinja template or use the kwarg {"enable_thinking": false}. Heads up — Qwen3.6 doesn't support the /think and /no_think soft switches that Qwen3 had, so the kwarg is the way.

What's included:

- Q8_K_P, Q6_K_P, Q5_K_P, Q4_K_P, IQ4_XS, Q3_K_P, IQ3_M, IQ3_XS, Q2_K_P, IQ2_M

- mmproj for vision support

- All quants generated with imatrix

K_P Quants recap (for anyone who missed the MoE releases): custom quants that use model-specific analysis to preserve quality where it matters most. Each model gets its own optimized profile. Effectively 1-2 quant levels of quality uplift at ~5-15% larger file size. Fully compatible with llama.cpp, LM Studio, anything that reads GGUF (Be forewarned, Ollama can be more difficult to get going).

Quick specs:

- 27B dense

- 64 layers — 16 × (3 × DeltaNet + 1 × Gated Attention) layout

- 48 linear attention + 16 full softmax attention (3:1 ratio, same as the MoE)

- 262K context (natively, extensible to ~1M with YaRN but careful — llama.cpp's YaRN is static and can hurt short-context perf)

- Multimodal (text + image + video)

Sampling params I've been using:

temp=1.0, top_k=20, top_p=0.95, min_p=0, presence_penalty=0, repetition_penalty=1.0

(Qwen 3.6 updated their recommendations as follows: presence_penalty is 0.0 for thinking general, not 1.5 like 3.5 was. Non-thinking mode still wants 1.5. Full settings, and my findings on it, are in the HF README.)

Note: Use --jinja flag with llama.cpp. K_P quants may show as "?" in LM Studio's quant column. It's purely cosmetic, model loads and runs fine.

HF's hardware compatibility widget also doesn't recognize K_P so click "View +X variants" or go to Files and versions to see all downloads.

All my models: HuggingFace-HauhauCS

There's also a new discord server, the link for it is in the HF repo, feel free to join for updates, roadmaps, projects, or just to chat.

As always, hope everyone enjoys the release!

* = Tested with both automated and manual refusal benchmarks which resulted in none found. Release has been on the quick side though, so if you hit one and it's obstructive to your use case, join the Discord and flag it so I can work on it in a future revision.

submitted by /u/hauhau901
[link] [comments]