Qwen3.6-35B-A3B-Plus-Uncensored-Wasserstein (neuron level surgery)

Reddit r/LocalLLaMA / 4/22/2026

💬 OpinionTools & Practical UsageModels & Research

Key Points

  • The author reports that in a Qwen3.6 MoE model, a large share of neurons can “die” (reach exact zero values) at the per-neuron level, unlike Qwen3.5 9B where no zero neurons were observed.
  • They suspect this neuron death may contribute to LLM quality degradation during training and note independent confirmation of the issue by a hiring company using different detection methods.
  • To mitigate the problem, they performed low-level binary editing to restore dead neurons by copying weights from healthy neighboring neurons and applying linear interpolation.
  • The article provides fixed GGUF and fp8 safetensors model files, plus a conversion script from Q8_0 GGUF to safetensors, and claims the FP8 version keeps gradients alive.
  • They recommend specific quantization formats (e.g., MXFP4_MOE and Q8_0) and share related prompt/template resources for running the model.

Hello everyone. During data debugging session on per tensor and per neuron level I found that neurons in tensor layers in MoE model can die (have zero value). Here the log.

For example In blk.0.ffn_gate_exps.weight and blk.0.ffn_up_exps.weight in Qwen3.6 35B A3B Q8_0 quant:

I found 40% of zero neurons.

In Qwen3.5 9B I didn't found any zero blocks. All blocks in it contain value.

Don't know why this is happened. I never trained LLM's by myself, but this problem exists. A company I'm interviewing with independently confirmed these findings using different detection methods. But I think this is the main reason why LLM degrade during training.

I fixed the model as much I can on Google Collab Free Tier CPU on binary level. And restored dead neurons (7.5 million zero blocks in Q8 quant) in tensors via copy/pasting binary weight data from healthy neighbour neurons to dead neurons + linear interpolation.

Here fixed GGUF model: https://huggingface.co/LuffyTheFox/Qwen3.6-35B-A3B-Plus-Uncensored-Wasserstein-GGUF

And benchmark from user: https://huggingface.co/LuffyTheFox/Qwen3.6-35B-A3B-Plus-Uncensored-Wasserstein-GGUF/discussions/1#69e772a7b01172a7d35fb655

And .safetensors fp8_e4m3fn version: https://huggingface.co/LuffyTheFox/Qwen3.6-35B-A3B-Plus-Uncensored-Wasserstein-Safetensors

I converted Q8_0 to .safetensors via this script: https://huggingface.co/LuffyTheFox/Qwen3.6-35B-A3B-Plus-Uncensored-Wasserstein-Safetensors/raw/main/gguf_to_safetensors.py

FP8 version uncensored version in .safetensors is trainable - gradients are alive in it without zeros.

Model is based on this one: https://huggingface.co/HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive . Thanks to HauhauCS for amazing job.

System prompt: https://pastebin.com/pU25DVnB

Chat template: https://pastebin.com/Dy2fmmpN

Recommended quants: MXFP4_MOE and Q8_0

Recommended Settings (LM Studio):

Parameter Value
Temperature 0.7
Top K Sampling 20
Presence Penalty 1.5
Repeat Penalty Disabled
Top P Sampling 0.8
Min P Sampling 0
Seed 42

Enjoy ^_^

PS: Qwen team released 3.6 27B version. I can't use it on my RTX 3060 12GB, but I will heal it for community and release after HauhauCS 27B uncensored release.

submitted by /u/EvilEnginer
[link] [comments]