Qwen3.6-35B-A3B Uncensored Aggressive is out with K_P quants!

Reddit r/LocalLLaMA / 4/17/2026

📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

Qwen3.6-35B-A3B “Aggressive” uncensored variant has been released on Hugging Face, claiming zero refusals (0/465) and no capability loss from the prior 3.5-35B release, while remaining otherwise unchanged.
The release includes multiple K_P quantized GGUF options (e.g., Q8/Q6/Q5/Q4_K_P and others) plus mmproj for vision support, with all quants generated using imatrix.
K_P quants are described as model-specific, analysis-driven quant profiles intended to preserve quality where it matters most, offering an estimated 1–2 quant-level quality uplift at roughly 5–15% larger files, and maintaining compatibility with llama.cpp and other GGUF readers.
Users are advised that disabling “thinking” requires either editing the llama.cpp Jinja template or passing {"enable_thinking": false}, and that LM Studio may show K_P as “?” in its quant column even though the model should run correctly.
The post also notes HF tooling may not recognize K_P in the hardware compatibility widget, and points users to the repo’s variant listings plus a new Discord for updates and roadmap discussion.

The Qwen3.6 update is here. 35B-A3B Aggressive variant, same MoE size as my 3.5-35B release but on the newer 3.6 base.

Aggressive = no refusals; it has NO personality changes/alterations or any of that, it is the ORIGINAL release of Qwen just completely uncensored

https://huggingface.co/HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive

0/465 refusals. Fully unlocked with zero capability loss.

From my own testing: 0 issues. No looping, no degradation, everything works as expected.

To disable "thinking" you need to edit the jinja template or simply use the kwarg {"enable_thinking": false}

What's included:

- Q8_K_P, Q6_K_P, Q5_K_P, Q4_K_P, Q4_K_M, IQ4_NL, IQ4_XS, Q3_K_P, IQ3_M, Q2_K_P, IQ2_M

- mmproj for vision support

- All quants generated with imatrix

K_P Quants recap (for anyone who missed the 122B release): custom quants that use model-specific analysis to preserve quality where it matters most. Each model gets its own optimized profile. Effectively 1-2 quant levels of quality uplift at ~5-15% larger file size. Fully compatible with llama.cpp, LM Studio, anything that reads GGUF (Ollama can be more difficult to get going).

Quick specs:

- 35B total / ~3B active (MoE — 256 experts, 8 routed per token)

- 262K context

- Multimodal (text + image + video)

- Hybrid attention: linear + softmax (3:1 ratio)

- 40 layers

Some of the sampling params I've been using during testing:

temp=1.0, top_k=20, repeat_penalty=1, presence_penalty=1.5, top_p=0.95, min_p=0

But definitely check the official Qwen recommendations too as they have different settings for thinking vs non-thinking mode :)

Note: Use --jinja flag with llama.cpp. K_P quants may show as "?" in LM Studio's quant column. It's purely cosmetic, model loads and runs fine.

HF's hardware compatibility widget also doesn't recognize K_P so click "View +X variants" or go to Files and versions to see all downloads.

All my models: HuggingFace-HauhauCS

Also new: there's a Discord now as a lot of people have been asking :) Link is in the HF repo, feel free to join for updates, roadmaps, projects, or just to chat.

Hope everyone enjoys the release.

submitted by /u/hauhau901
[link] [comments]