Hello everyone. I found and fixed training bug in Qwen3.5 35B A3B model.
Here my fixed version: https://huggingface.co/LuffyTheFox/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-GGUF
Upgraded system prompt that unlocks deep thinking (works great with this model):
https://pastebin.com/pU25DVnB
Chat template: https://pastebin.com/uk9ZkxCR (supports tool calling)
Recommended Settings (LM Studio):
| Temperature | 0.7 |
|---|---|
| Top K Sampling | 20 |
| Presence Penalty | 1.5 |
| Top P Sampling | 0.8 |
| Min P Sampling | 0 |
| Seed | 3407 |
History:
I've been using Qwen 3.5 35B A3B (the uncensored version by HauhauCS) for a while. It's an incredible model - uncensored, MoE with 256 experts, hybrid DeltaNet + Attention, 40 layers, works fine on my RTX 3060 12GB GPU, and has fresh knowledge. But something was off. On short prompts it works fine. On long conversations it started "philosophizing" - losing context, repeating itself, writing broken code with strange comments.
I spent two weeks digging through the weights.
What I found:
Two tensors. In blocks 36 and 37. ssm_conv1d.weight.
Their scale was ~60% higher than normal (σ=0.102 vs median 0.063). Because of how AdamW works, rare experts in the last layers get a huge effective learning rate - their weights drift.
In a recurrent architecture like DeltaNet, this kills the hidden state. The model forgets context after a few tokens.
Surprisingly I didn't found any issues in Gemma 4 26B A4B - all scales were correct in model.
What I did:
I scaled broken tensors back to normal. Nothing else. 489 other tensors were left untouched - their scale is architectural (gate_inp, etc.).
Results:
- Error reduction: 88.6%.
- Long conversations now stay coherent.
- Code generation works.
- No more "philosophizing", even with my complex System Prompt.
What I learned:
One bug. Two tensors. 64GB of model. And the entire potential of the most complex open-weight architecture was locked behind it.
If you're using MoE + recurrent hybrids (DeltaNet, Mamba, etc.), check your last blocks. AdamW might have silently broken them.
Enjoy ^_^
[link] [comments]


