The dense sibling of the 35B-A3B drop is here, Qwen3.6 27B Uncensored Aggressive is out!
Aggressive = no refusals; NO personality changes/alterations or any of that, it is the ORIGINAL release of Qwen just completely uncensored
https://huggingface.co/HauhauCS/Qwen3.6-27B-Uncensored-HauhauCS-Aggressive
0/465 refusals*. Fully unlocked with zero capability loss.
From my own testing: 0 issues. No looping, no degradation, everything works as expected.
One thing I noticed vs the 35B-A3B: this model is a bit more sensitive to prompt clarity. Vague/under-specified prompts can drift so do your best to spell out format, constraints, scope and it stays on rails. FYI so you get the most out of it. To me it seems like it's a 'coding/stem-first' model from the way it handles social interactions.
To disable "thinking" you need to edit the jinja template or use the kwarg {"enable_thinking": false}. Heads up — Qwen3.6 doesn't support the /think and /no_think soft switches that Qwen3 had, so the kwarg is the way.
What's included:
- Q8_K_P, Q6_K_P, Q5_K_P, Q4_K_P, IQ4_XS, Q3_K_P, IQ3_M, IQ3_XS, Q2_K_P, IQ2_M
- mmproj for vision support
- All quants generated with imatrix
K_P Quants recap (for anyone who missed the MoE releases): custom quants that use model-specific analysis to preserve quality where it matters most. Each model gets its own optimized profile. Effectively 1-2 quant levels of quality uplift at ~5-15% larger file size. Fully compatible with llama.cpp, LM Studio, anything that reads GGUF (Be forewarned, Ollama can be more difficult to get going).
Quick specs:
- 27B dense
- 64 layers — 16 × (3 × DeltaNet + 1 × Gated Attention) layout
- 48 linear attention + 16 full softmax attention (3:1 ratio, same as the MoE)
- 262K context (natively, extensible to ~1M with YaRN but careful — llama.cpp's YaRN is static and can hurt short-context perf)
- Multimodal (text + image + video)
Sampling params I've been using:
temp=1.0, top_k=20, top_p=0.95, min_p=0, presence_penalty=0, repetition_penalty=1.0
(Qwen 3.6 updated their recommendations as follows: presence_penalty is 0.0 for thinking general, not 1.5 like 3.5 was. Non-thinking mode still wants 1.5. Full settings, and my findings on it, are in the HF README.)
Note: Use --jinja flag with llama.cpp. K_P quants may show as "?" in LM Studio's quant column. It's purely cosmetic, model loads and runs fine.
HF's hardware compatibility widget also doesn't recognize K_P so click "View +X variants" or go to Files and versions to see all downloads.
All my models: HuggingFace-HauhauCS
There's also a new discord server, the link for it is in the HF repo, feel free to join for updates, roadmaps, projects, or just to chat.
As always, hope everyone enjoys the release!
* = Tested with both automated and manual refusal benchmarks which resulted in none found. Release has been on the quick side though, so if you hit one and it's obstructive to your use case, join the Discord and flag it so I can work on it in a future revision.
[link] [comments]
