FernflowerAI-35B-A3B-KL-ReLU-GGUF + Apple MLX

Reddit r/LocalLLaMA / 4/12/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • FernflowerAI-35B-A3B-KL-ReLU-GGUF is a repaired Qwen 3.5 35B A3B uncensored model released with additional KL-divergence and ReLU-asymmetry calibration diagnostics to address subtler weight-distribution drift beyond earlier context-collapse fixes.
  • The author reports that two broken tensors from the original training (notably in ssm_conv1d.weight for blocks 36–37) were first fixed by scaling back, resolving major context collapse and looping, but further testing revealed other tensors whose scale/saturation looked fine while distribution shape drifted.
  • KL divergence is used to restore the distribution shape of drifting tensors without altering scale or saturation, and the ReLU asymmetry probe is included to detect mean drift that can accumulate under AdamW (though it did not trigger for this specific model).
  • Quantitatively, the average KL divergence drops from 0.1036 to 0.0297, with 71.3% KL reduction, and the number of repaired tensors increases from 2 to 11 after the expanded diagnostic criteria.
  • The post also provides Apple MLX availability, including an 8-bit MLX version from froggeric and a “final release” safetensors/MLX version planned via a related discussion.

Qwen 3.5 35B A3B Uncensored HauhauCS (repaired) -> (now with KL + ReLU calibration)

Model available here: https://huggingface.co/LuffyTheFox/FernflowerAI-35B-A3B-KL-ReLU-GGUF

Repair summary: link

Extra information about how Qwen 3.5 35B got broken (and how I fixed it): link

V1 Apple MLX version (thanks to froggeric): https://huggingface.co/froggeric/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-MLX-8bit

V2 Apple MLX version (final release): coming soon discussion here

History:
Hello everyone. A few days ago I released a fixed version of Qwen 3.5 35B A3B uncensored by HauhauCS - two broken tensors that Alibaba shipped with Qwen 3.5 35B A3B model, due to heavy complexity and bug during training process in AdamW optimizer ssm_conv1d.weight in blocks 36-37 were scaled back to normal. That fixed the major context collapse and looping. But after more testing, I found that some other tensors (experts, attention projections) had a subtler problem. Their overall scale and saturation looked fine, but the shape of their weight distribution was drifting away from the peer group. C1 and C2 didn't catch this. C3 (KL divergence) did.

So I added two more criteria to the diagnostic pass:

  • KL divergence - restores the distribution shape of tensors that drifted from their peer group without changing scale or saturation.
  • ReLU asymmetry - detects mean drift that AdamW can accumulate over time (didn't fire on this model, but the probe is there for others).

Results on this version:

Metric Before After
KL divergence (average) 0.1036 0.0297
KL reduction 71.3%
Repaired tensors (C2 + C3) 2 11

What this means for you:

  • The model was already stable after v1. Now it's tighter - fewer hidden distribution anomalies that could cause weird behavior on very long or complex tasks.
  • No new problems introduced. The 489 healthy tensors were left untouched.

Upgraded system prompt that unlocks deep thinking (works great with this model):
https://pastebin.com/pU25DVnB

Also you can use only one string in System Prompt. And add anything you want after it:
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.

Quantization script available here: https://pastebin.com/hXhcMJn9

Updated chat template: https://pastebin.com/uk9ZkxCR (with tool fixes from froggeric and disabled thinking)

Recommended Settings (LM Studio):

Temperature 0.7
Top K Sampling 20
Presence Penalty 1.5
Repeat Penalty Disabled or 1.0
Top P Sampling 0.8
Min P Sampling 0
Seed 3407

Enjoy ^_^

submitted by /u/EvilEnginer
[link] [comments]