Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants

Reddit r/LocalLLaMA / 3/22/2026

📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

The release of Qwen3.5-122B-A10B Aggressive brings an uncensored version with no refusals or personality changes, restoring the original Qwen behavior.
It claims 0/465 refusals and no capability loss, with an option to disable thinking by editing the jinja template or using the {"enable_thinking": false} kwarg.
New K_P quantizations (Q8_K_P, Q6_K_P, Q6_K, Q5_K_M, Q4_K_P, Q4_K_M, IQ4_XS, Q3_K_M, Q3_K_P, IQ3_M, IQ3_XXS, IQ2_M) offer 1-2 quant levels of quality improvement at about 5-15% larger file size and are compatible with GGUF readers like llama.cpp; Ollama setup may be more challenging.
Specifications include 122B total / ~10B active (MoE — 256 experts, 8+1 active per token), 262K context, multimodal support (text + image + video), 48 layers, and a note to consult official Qwen recommendations for thinking vs non-thinking modes.
The post hints at future releases (Gemma3) and ongoing work on Nemotron3, indicating continued development of uncensored/more capable models.

The big one is (finally) here. Qwen3.5-122B-A10B Aggressive is out!

Aggressive = no refusals; it has NO personality changes/alterations or any of that, it is the ORIGINAL release of Qwen just completely uncensored

https://huggingface.co/HauhauCS/Qwen3.5-122B-A10B-Uncensored-HauhauCS-Aggressive

0/465 refusals. Fully unlocked with zero capability loss.

This one was absolutely brutal. Several weeks of literal nonstop work. Lots of obstacles which luckily got overcame. From my own testing: 0 issues. No looping, no degradation, everything works as expected.

To disable "thinking" you need to edit the jinja template or simply use the kwarg '{"enable_thinking": false}'

New: K_P quants

This release introduces new K_P ("Perfect", don't judge, i literally couldn't come up with something else and didn't want to overlap unsloth's XL) quantizations. These use model-specific analysis to selectively preserve quality where it matters most. For each model I tweak its own optimized profile. A K_P quant effectively gives you 1-2 quant levels better quality at only ~5-15% larger file size. Q4_K_P performs closer to Q6_K. Fully compatible with llama.cpp, LM Studio, anything that reads GGUF but be forwarned, Ollama can be more difficult to get going.

What's included:

- Q8_K_P, Q6_K_P, Q6_K, Q5_K_M, Q4_K_P, Q4_K_M, IQ4_XS, Q3_K_M, Q3_K_P, IQ3_M, IQ3_XXS, IQ2_M (moving forward I will retire the standard Q8_0+Q6_K and focus on the K_P variants for them as they're net superior)

- mmproj for vision support

- All quants generated with imatrix

- No BF16 this time — it's ~250GB and I'd rather use that HF space for an entire new model

(Gemma3 is next — a lot of you have been asking)

Nemotron3 is also 'done' however I'm currently struggling with the RL on it (I either remove it and COMPLETELY uncensor everything with 1-2% damage or leave those bits in and preserve lossless uncensoring at about 2/465 'refusals'). This needs some extra time/work from me which I'm unsure it deserves currently (models performing subpar to competition).

Quick specs:

- 122B total / ~10B active (MoE — 256 experts, 8+1 active per token)

- 262K context

- Multimodal (text + image + video)

- Hybrid attention: Gated DeltaNet + softmax (3:1 ratio)

- 48 layers

Sampling params I've been using:

temp=1.0, top_k=20, repeat_penalty=1, presence_penalty=1.5, top_p=0.95, min_p=0

But definitely check the official Qwen recommendations too as they have different settings

for thinking vs non-thinking mode :)

Note: Use --jinja flag with llama.cpp. K_P quants may show as "?" in LM Studio's quant

column. It's purely cosmetic and model loads and runs fine.