Qwen3.5-35B-A3B-Uncensored-FernflowerAI-GGUF

Reddit r/LocalLLaMA / 4/9/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The author reports finding and fixing a training/weight issue in the Qwen3.5-35B-A3B “Uncensored FernflowerAI” GGUF model, caused by two tensors (`ssm_conv1d.weight` in blocks 36–37) having unusually high scale (~60% higher than normal).
  • They explain that, due to AdamW dynamics in the final layers, the incorrect tensor scaling can lead rare experts to drift, which then breaks the hidden state in the model’s recurrent-style DeltaNet hybrid architecture—manifesting as context loss, repetition, and broken code during long chats.
  • A repaired variant is shared on Hugging Face, alongside an upgraded system prompt intended to “unlock deep thinking,” plus a chat template that supports tool calling.
  • Recommended LM Studio sampling settings (temperature/top-k/top-p/penalties/seed) are provided, and the author claims large improvements including an 88.6% error reduction and better long-conversation coherence and code generation.
  • They suggest users of MoE + recurrent hybrids (e.g., DeltaNet, Mamba) should verify the last blocks’ tensor scales because the problem could be silent and widely impactful.

Hello everyone. I found and fixed training bug in Qwen3.5 35B A3B model.

Here my fixed version: https://huggingface.co/LuffyTheFox/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-GGUF

Upgraded system prompt that unlocks deep thinking (works great with this model):
https://pastebin.com/pU25DVnB

Chat template: https://pastebin.com/uk9ZkxCR (supports tool calling)

Recommended Settings (LM Studio):

Temperature 0.7
Top K Sampling 20
Presence Penalty 1.5
Top P Sampling 0.8
Min P Sampling 0
Seed 3407

History:

I've been using Qwen 3.5 35B A3B (the uncensored version by HauhauCS) for a while. It's an incredible model - uncensored, MoE with 256 experts, hybrid DeltaNet + Attention, 40 layers, works fine on my RTX 3060 12GB GPU, and has fresh knowledge. But something was off. On short prompts it works fine. On long conversations it started "philosophizing" - losing context, repeating itself, writing broken code with strange comments.

I spent two weeks digging through the weights.

What I found:

Two tensors. In blocks 36 and 37. ssm_conv1d.weight.

Their scale was ~60% higher than normal (σ=0.102 vs median 0.063). Because of how AdamW works, rare experts in the last layers get a huge effective learning rate - their weights drift.

In a recurrent architecture like DeltaNet, this kills the hidden state. The model forgets context after a few tokens.

Surprisingly I didn't found any issues in Gemma 4 26B A4B - all scales were correct in model.

What I did:

I scaled broken tensors back to normal. Nothing else. 489 other tensors were left untouched - their scale is architectural (gate_inp, etc.).

Results:

  • Error reduction: 88.6%.
  • Long conversations now stay coherent.
  • Code generation works.
  • No more "philosophizing", even with my complex System Prompt.

What I learned:

One bug. Two tensors. 64GB of model. And the entire potential of the most complex open-weight architecture was locked behind it.

If you're using MoE + recurrent hybrids (DeltaNet, Mamba, etc.), check your last blocks. AdamW might have silently broken them.

Enjoy ^_^

submitted by /u/EvilEnginer
[link] [comments]