AI Navigate

Qwen3.5-35B-A3B-Uncensored-Claude-Opus-4.6-Affine

Reddit r/LocalLLaMA / 3/21/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • The post introduces a merged model named Qwen3.5-35B-A3B-Uncensored-Claude-Opus-4.6-Affine, created by blending HauhauCS’s Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive with Jackrong’s Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled and applying a KL-divergence-based fusion.
  • It claims the model runs on an RTX 3060 12 GB and achieves 17-18 tokens per second without offloading, staying in the compressed IQ4_XS format.
  • The author fixed the first layer and late layers (blk.0, blk.35, blk.39) and stabilized attention/expert components to reduce issues after compression.
  • It provides usage guidance including System Prompt and LM Studio settings (temperature, Top K, Top P, seed, etc.) and a sample prompt, plus a demonstration prompt to build an Arkanoid game in HTML5/JS in a Tron-like style.

Hello everyone. So, some people asked me to do the merge for Qwen 3.5-35 A3B model. Because it has only 3 active billion parameters and can run on old GPU (RTX 3060 12GB)

Introducing: https://huggingface.co/LuffyTheFox/Qwen3.5-35B-A3B-Uncensored-Claude-Opus-4.6-Affine

This model has been made via merging:

  1. The most popular model by HauhauCS on HuggingFace: https://huggingface.co/HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive
  2. And Qwen 3.5 35B A3B Claude Opus 4.6 distilled model by Jackrong: https://huggingface.co/Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled
  3. After merging I ran a special script that, added the "thinking skills" from Jackrong model to HauhauCS model. Cleaned up any weirdness using a math method called KL divergence. Did all of this in Google Colab Free Tier without unpacking the model - it stayed in the compressed IQ4_XS format.

Also I fixed:

  • The very first layer (blk.0) - this handles raw input, so it often gets messy
  • A few late layers (blk.35, blk.39) - these handle final output and often show problems after compression
  • Attention and expert parts - these are the most sensitive parts of the model

Results:

17-18 tokens per second on my RTX 3060 12 GB without offloading. With skills in programming, writing, and human like short, natural and simple communication, without censorship.

For best model perfomance please use following settings in LM Studio 0.4.7 (build 4):

  1. Use this System Prompt: https://pastebin.com/pU25DVnB
  2. If you want to disable thinking use this chat template in LM Studio: https://pastebin.com/uk9ZkxCR
  3. Temperature: 0.7
  4. Top K Sampling: 20
  5. Repeat Penalty: (disabled) or 1.0
  6. Presence Penalty: 1.5
  7. Top P Sampling: 0.8
  8. Min P Sampling: 0.0
  9. Seed: 3407

Here model programming skills in action: https://pastebin.com/44VtLGxf

Via prompt:
"Write an Arkanoid game using HTML5 and Javascript. The game should be controlled with a mouse and include generated sounds and effects. The game should be in the style of the film Tron: Legacy."

I hope you like it ^_^. Please upvote if you like the model, so more people will see it.
Frankly saying this is best local AI I ever used in my practice. And I am very impressed with the results.

submitted by /u/EvilEnginer
[link] [comments]